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Abstract 

We consider the classical two-encoder multiterminal source coding 
problem where distortion is measured under logarithmic loss. We pro- 
vide a single-letter description of the achievable rate distortion region for 
arbitrarily correlated sources with finite alphabets. In doing so, we also 
give the rate distortion region for the m-encoder CEO problem (also under 
logarithmic loss). Several applications and examples are given. 

1 Introduction 

A complete characterization of the achievable rate distortion region for the two- 
encoder source coding problem depicted in Figure [l] has remained an open prob- 
lem for over three decades. Following tradition, we will refer to this two-encoder 
source coding network as the multiterminal source coding problem throughout 
this paper. Several special cases have been solved for general source alphabets 
and distortion measures: 

• The lossless case where Di — 0,D2 — 0. Slepian and Wolf solved this case 
in their seminal work [l]. 

The case where one source is recovered losslessly: i.e., Di = 0,D2 = 
Dra&x- This case corresponds to the source coding with side information 
problem of Ahlswede-Korner-Wyner |2], |3]. 

The Wyner-Ziv case [i] where I2 is available to the decoder as side infor- 
mation and Yi should be recovered with distortion at most Di. 

• The Berger-Yeung case (which subsumes the previous three cases) [sj 
where Z?i is arbitrary and D2 = 0. 
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Ed(Yi",Y]") < Di 



Figure 1: The multiterminal source coding network. 



Despite the apparent progress, other seemingly fundamental cases, such as 
when Di is arbitrary and D2 — i'max, remain unsolved except perhaps in very 
special cases. 

Recently, the achievable rate distortion region for the quadratic Gaussian 
multiterminal source coding problem was given by Wagner, Tavildar, and 
Viswanath fo'. Until now, this was the only case for which the entire achievable 
rate distortion region was known. While this is a very important result, it is 
again a special case from a theoretical point of view: a specific choice of source 
distribution, and a specific choice of distortion measure. 

In the present paper, we determine the achievable rate distortion region 
of the multiterminal source coding problem for arbitrarily correlated sources 
with finite alphabets. However, as in [6], we restrict our attention to a specific 
distortion measure. 

At a high level, the roadmap for our argument is similar to that of |6 . In 
particular, both arguments couple the multiterminal source coding problem to a 
parametrized family of CEO problems. Then, the parameter in the CEO prob- 
lem is "tuned" to yield the converse result. Despite this apparent similarity, the 
proofs in [g] rely heavily on the previously known Gaussian CEO results , the 
Gaussian one-helper results [Sj, and the calculus performed on the closed-form 
entropy expressions which arise from the Gaussian source assumption. In our 
case we do not have this luxury, and our CEO tuning argument essentially relies 
on an existence lemma to yield the converse result. The success of our approach 
is largely due to the fact that the distortion measure we consider admits a lower 
bound in the form of a conditional entropy, much like the quadratic distortion 
measure for Gaussian sources. 

1.1 Our Contributions 

In this paper, we give a single-letter characterization of the achievable rate dis- 
tortion region for the multiterminal source coding problem under logarithmic 
loss. In the process of accomplishing this, we derive the achievable rate distor- 
tion region for the m-encoder CEO problem, also under logarithmic loss. In 
both settings, we obtain a stronger converse than is standard for rate distor- 
tion problems in the sense that augmenting the reproduction alphabet does not 
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enlarge the rate distortion region. Notably, we make no assumptions on the 
source distributions, other than that the sources have finite alphabets. In both 
cases, the Berger-Tung inner bound on the rate distortion region is tight. To 
our knowledge, this constitutes the first time that the entire achievable rate 
distortion region has been described for general finite-alphabet sources under 
nontrivial distortion constraints. 

1.2 Organization 

This paper is organized as follows. In Section |2] we formally define the logarith- 
mic loss function and the multiterminal source coding problem we consider. In 
Section [3] we define the CEO problem and give the rate distortion region un- 
der logarithmic loss. In Section [4] we return to the multiterminal source coding 
problem and derive the rate distortion region for the two-encoder setting. Also 
in Sections [3] and |4j applications to estimation, horse racing, and list decoding 
are given. In Section [5] we discuss connections between our results and the mul- 
titerminal source coding problem with arbitrary distortion measures. Section |6] 
delivers our concluding remarks and discusses directions for future work. 

2 Problem Definition 

Throughout this paper, we adopt notational conventions that are standard in 
the literature. Specifically, random variables are denoted by capital letters (e.g., 
X) and their corresponding alphabets are denoted by corresponding calligraphic 
letters (e.g., X). We abbreviate a sequence {Xi,X2, ■ ■ ■ ,Xn) of n random vari- 
ables by X", and we denote the interval {Xk, Xk+i, ■ ■ ■ , Xj) by X^. If the 
lower index is equal to 1, it will be omitted when there is no ambiguity (e.g., 
X^ ^ -'^i)- Frequently, random variables will appear with two subscripts (e.g., 
Yi j). In this case, we are referring to the j*'' instance of random variable 1^. 
We overload our notation here slightly in that Yf^ is often abbreviated as . 
However, our meaning will always be clear from context. 

Let {{Yij,Y2j)}"^^ — (Y{^,Y^) be a sequence of n independent, identically 
distributed random variables with finite alphabets and 3^2 respectively and 
joint pmf ^(2/1,2/2)- That is, (^",^2") ^ W^iPiy^.j^Vy)- 

In this paper, we take the reproduction alphabet y.i to be equal to the set 
of probability distributions over the source alphabet yt for i = 1,2. Thus, for 
a vector Y" € 3^", we will use the notation Yij{yi) to mean the j*'* coordinate 
(1 < j < n) of Kj" (which is a probability distribution on 3^^) evaluated for the 
outcome yt € Vi- In other words, the decoder generates 'soft' estimates of the 
source sequences. 

We will consider the logarithmic loss distortion measure defined as follows: 

d{yi,yi) = log [jj^^ = D{iyAy)\\m{y)) for « = 1,2. 
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In particular, d{yi,yi) is the relative entropy (i.e., Kullback-Leibler divergence) 
between the empirical distribution of the event {Yi = i/i} and the estimate 
yi. Using this definition for symbol- wise distortion, it is standard to define the 
distortion between sequences as 

1 " 

We point out that the logarithmic loss function is a widely used penalty 
function in the theory of learning and prediction (cf. ^ Chapter 9] ) . Further, 
it is a particularly natural loss criterion in settings where the reconstructions 
are allowed to be 'soft', rather than deterministic values. Surprisingly, since 
distributed learning and estimation problems are some of the most oft-cited ap- 
plications of lossy multiterminal source coding, it does not appear to have been 



studied in this context until the recent work 10 . However, we note that this 
connection has been established previously for the single-encoder case in the 
study of the information bottleneck problem Beyond learning and predic- 
tion, a similar distortion measure has appeared before in the image processing 



literature 12 . As we demonstrate through several examples, the logarithmic 
loss distortion measure has a variety of useful applications in the context of 
multiterminal source coding. 

A rate distortion code (of blocklength n) consists of encoding functions: 

and decoding functions 

: { 1, . . . , m(") } X {l, . . . , Af(") } ^ for z = 1, 2. 

A rate distortion vector {Ri, R2, Di, D2) is strict-sense achievable if there 
its a bl 
such that 



exists a blocklength n, encoding functions gi^\g^^ and a decoder ip^^ 



Where 



R^ > -logM/"^ for i = 1,2 
n 

A > Ed{Y^,Yn for i = 1,2. 



Yr = #^(5i"^(n"),5^"^(>"2")) for * = 1,2. 



Definition 1. LetTZV* denote the set of strict-sense achievable rate distortion 
vectors and define the set of achievable rate distortion vectors to be its closure, 
TW*. 

Our ultimate goal in the present paper is to give a single-letter character- 
ization of the region TZT) . However, in order to do this, we first consider an 
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associated CEO problem. In this sense, the roadmap for our argument is similar 
to that of [g] . Specifically, both arguments couple the multiterminal source cod- 
ing problem to a parametrized family of CEO problems. Then, the parameter in 
the CEO problem is "tuned" to yield the converse result. Despite this apparent 
similarity, the proofs are quite different since the results in (6] depend heavily 
on the peculiarities of the Gaussian distribution. 

3 The CEO problem 

In order to attack the general multiterminal problem, we begin by study- 
ing the CEO problem (See [13] for an introduction.). To this end, let 
{{Xj,yi,j,Y2,j)}^^i = (-^",^",^2") be a sequence of n independent, iden- 
tically distributed random variables distributed according to the joint pmf 
p{x,yi,y2) — p{x)p{yi\x)p{y2\x). That is, Yi ^ X ^ Y2 form a Markov chain, 
in that order. 

In this section, we consider the reproduction alphabet X to be equal to the 
set of probability distributions over the source alphabet X. As before, for a 
vector X" S A'", we will use the notation Xj{x) to mean the j*'' coordinate of 
X" (which is a probability distribution on X) evaluated for the outcome x Q X. 
As in the rest of this paper, d{-, •) is the logarithmic loss distortion measure. 

A rate distortion CEO code (of blocklength n) consists of encoding functions: 

and a decoding function 

: {i,...,m("'} X {l,...,Mf)} -^;e". 

A rate distortion vector {Ri, R2, D) is strict-sense achievable for the CEO 
problem if there exists a blocklength n, encoding functions g["'\ g2^"' and a 
decoder tp^'^'> such that 

R, > -logAf/"^ for i = 1,2 
n 

D > Erf(X",X"). 

Where 

x" = v^")(.g("\ri"),g(")(y2")). 

Definition 2. Let TZVq^q denote the set of strict-sense achievable rate dis- 
tortion vectors and define the set of achievable rate distortion vectors to be its 
closure, TZVq^q. 
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3.1 Inner Bound 



Definition 3. Let {Ri, R2, D) € TZV^^q if and only if there exists a joint 
distribution of the form 

p{x, yi , y2)p{ui\yi , 9X^2 |y2, q)p{q) 

where \Ui\ < |3^i|, \U2\ < |3^2|, and \ Q\ < A, which satisfies 

Ri>IiYi;Ui\U2,Q) 
R2>I{Y2;U2\Ui,Q) 
Ri+R2>I{Ui,U2;Yi,Y2\Q) 
D > H{X\Ui,U2,Q). 

Theorem 1. TZDq^q C TZT^q^q. That is, all rate distortion vectors 
(Ri, R2, D) G TZD^j^Q are achievable. 

Before proceeding with tiie proof, we cite the following variant of a well- 
known inner bound: 

Proposition 1 (Berger-Tung Inner Bound fl4|[l5] ). The rate distortion vector 
{Ri, R2, D) is achievable if 

Ri>IiUi;Yi\U2,Q) 
R2>I{U2;Y2\Ui,Q) 
Ri + R2>I{Ui,U2;Yi,Y2\Q) 
D>E[d{X,f{U,,U2,Q)] 

for a joint distribution 

p{x)p{yi \x)p{y2\x)p{ui\yi , g)p(u2 1^2 , q)p{q) 
and reproduction function 

f :l(ixU2X Q-> X. 

The proof of this proposition is a standard exercise in information theory, 
and is therefore omitted. The interested reader is directed to the text jl6j for 
a modern, detailed treatment. The proposition follows from what is commonly 
called the Berger-Tung achievability scheme. In this encoding scheme, each 
encoder quantizes its observation Y" to a codeword ?7", such that the empiri- 
cal distribution of the entries in (Fj",C/") is very close to the true distribution 
p{yi,Ui). In order to communicate their respective quantizations to the de- 
coder, the encoders essentially perform Slepian-Wolf coding. For this reason, 
the Berger-Tung achievability scheme is also referred to as a "quantize-and-bin" 
coding scheme. 
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Proof of Theorem^ Given Proposition [T] the proof of Theorem[T]is immediate. 
Indeed, if we apply Proposition 111 with the reproduction function f{Ui, C/2, Q) = 
Pr [X — x\Ui, U2, Q], we note that 

E[d{XJ{Ui,U2,Q)] = HiX\Ui,U2,Q), 

which yields the desired result. □ 

Thus, from the proof of Theorem [l] we see that our inner bound 'R-'D^ceo 
simply corresponds to a specialization of the general Berger-Tung inner bound 
to the case of logarithmic loss. 

3.2 A Matching Outer Bound 

A particularly useful property of the logarithmic loss distortion measure is that 
the expected distortion is lower-bounded by a conditional entropy. A similar 
property is enjoyed by Gaussian random variables under quadratic distortion. 
In particular, if G is Gaussian, and G is such that E(G — G)^ < D, then 
^ log(27re)-D > h{G\G). The case for logarithmic loss is similar, and we state it 
formally in the following lemma which is crucial in the proof of the converse. 

Lemma 1. Let Z — (^i"), <?2"^ (-^2")) argument of the reproduction 

function V'^"^ Then nEd(X",X") > 

Proof. By definition of the reproduction alphabet, we can consider the repro- 
duction X" to be a probability distribution on X" conditioned on the argument 
Z. In particular, if a;" = tp^"\z), define s{x"^\z) = Y[j=i^ji'^j)- It is readily 
verified that s is a probability measure on X". Then, we obtain the following 
lower bound on the expected distortion conditioned on Z ^ z: 



E 



d(X",X")|Z = z 



1 



i=i x'^eX" 



log 



E 



p{x"\z) }_^\og 
j=i 

= - V p(x"|z)log 
p(a;"|z)log 



1 

Xj{Xj) 

1 

Xj {Xj ) 



x^eX" 



1 



E 

x'^eX" 



s{x^\z) 
s(x"|z) 

1 



1 



= z) 



= -D (p(x"|z)|l.s(a;"|z)) + = z) 

n n 

> -H{X''\Z = z), 



where = Pr (X" = x"'\Z = z) is the true conditional distribution. Av- 

eraging both sides over all values of Z , we obtain the desired result. □ 
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Definition 4. Let (Ri, R2, D) G TZTJq^q if and only if there exists a joint 
distribution of the form 

p{x)p{yi\x)p{y2\x)p{ui\yi,q)p{u2\y2,q)p{q), 

which satisfies 



Ri >I{Yi;Ui\X,Q) + H{X\U2,Q)~D 

R2 >I{Y2;U2\X,Q) + H{X\Ui,Q)-D 

R1+R2 > I{Ui;Yi\X, Q) + I{U2;Y2\X, Q) + H{X) ~ D 

D >H{X\Ui,U2,Q). 



(1) 



Theorem 2. If (Ri, R2, D) is strict- sense achievable for the CEO problem, then 
{RuR2,D)^nVlEo- 

Proof. Suppose the point {Ri, R2, D) is strict-sense achievable. Let A be a 
nonempty subset of {1, 2}, and let Ft = 5-"^ (Y^^) be the message sent by encoder 
i e {1,2}. Define f7,j ^ and Qj ^ {X^-^,X^\^) = To 

simplify notation, let Ya — Ui^AYi (similarly for Ua and Fa)- 

With these notations established, we have the following string of inequalities: 



n 

> H{Fa) 
>I{YX;Fa\Fa^) 

= IiX-,YX;FA\FA^) (2) 
= /(X"; Fa\Fa^) + J2 m-^Y^X^^) (3) 

n 

= H{X-\FA^)~H{X-\F^,F2)+Y,Y.^{Y,^r^F,\X-,Yr^) 

ieA]=i 

n 

> H{X-\FA^) + J2J2^(^^^fP''\^"'^r')-^'D (4) 

n n 

= J2 H{X, \Fa^,X^-') + E E mS,F,\X'', Yr') no 

n n 

= Y.H{X,\FA^,X^-') + Y.11^0'^^r^U,^^^,,Q,)-nD (5) 

n n 

> E H{X,\Ua^^,,Q,) + E E ^^^^1^ U^.3\X„Q,) - nD. (6) 
j=i ieA j=i 

The nontrivial steps above can be justified as follows: 

• ([2| follows since Fa is a function of . 
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• ([s]) follows since Fi is a function of and Fi o X" -H> F2 form a Markov 
chain (since O X" -H> Y2 form a Markov chain). 

• ^ follows since > i^z) by Lemma[lj 

• ([5| follows from the Markov chain Yij ^ X" o Y/~^, which follows from 
the i.i.d. nature of the source sequences. 

• ^ simply follows from the fact that conditioning reduces entropy. 
Therefore, dividing both sides by n, we have; 



ieA j = l i&A j = l 

Also, using Lemma [1] and the fact that conditioning reduces entropy: 

1 1 " 

D > -iJ(X"|i^i,F2) > -Y,H{X,\Ui,j,U2,j,Qj). 

Observe that Qj is independent of (Xj,Yij,Y2,j) and, conditioned on Qji we 
have the long Markov chain Uij o Yij o Xj o Y2.j o U2.j- Finally, by a 
standard time-sharing argument, we conclude by saying that if {Ri, R2, D) is 
strict-sense achievable for the CEO problem, then 

Ri > IiYi;Ui\X,Q) + H{X\U2,Q) - D 
i?2 > I{Y2; U2\X, Q) + H{X\Ui,Q) - D 
Ri+R2> I{Ui;Yi\X, Q) + /(C/2; Fal^, Q) + H{X) - D 
D > H{X\Ui,U2,Q). 

for some joint distribution j/i, ?/2)p(wi|2;i, g)p(u2|?/2, <?)• □ 

Theorem 3. nV°cEo = T^T^ceo = T^ceo- 

Proof. We first remark that the cardinality bounds on the alphabets in the 
definition of TZVq^q can be imposed without any loss of generality. This is a 
consequence of 17 Lemma 2.2] and is discussed in detail in Appendix [A| 

Therefore, it will suffice to show TIVq^q C TZVq^q without considering 
the cardinality bounds. To this end, fix p{q), p{ui\yi,q), and p{u2\y2,q) and 
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consider the extreme point^of polytope defined by the inequahties ([I]): 



0, 0, /(Fi; Ui \X, Q) + I{Y2; C/2IX, Q) + H{X) 
I{Y^-Ui\Q),0J{U2:Y2\X,Q)+H{X\Ui,Q) 
0, 7(^2 ; U2\Q),I{Ui-Yi\X, Q)+H{X\U2,Q) 
I{Yi;Ui\Q),I{Y2;U2\Ui,Q),H{X\Ui,U2,Q) 
I{Yi;Ui\U2,Q)J{Y2;U2\Q),H{X\Ui,U2,Q) 



Pi = 
P2 = 
P3 = 
Pi = 
P^ = 



where the point is a triple {r[^\ R^2^\ D'-^^). We say a point {r[^\ Ri^\ D'^^'^) 
is dominated by a point in TZVq^q if there exists some {Ri, R2, D) g TZVq^q 
for which i?i < i?2 < i?2''\ and D < D'-^^ Observe that each of the 

extreme points Pi , . . . , P5 is dominated by a point in TZV^j^q : 

• First, observe that P4 and P5 are both in TZ'D\j^q, so these points are not 
problematic. 

• Next, observe that the point (0, 0,i?(X)) is in TZV^j^q, which can be 
seen by setting all auxiliary random variables to be constant. This point 
dominates Pi. 

• By using auxiliary random variables {Ui,U2,Q) — {Ui,%,Q), the point 
{I{Yi]Ui\Q),G,H{X\Ui,Q)) is in TZV^jeo^ and dominates the point Pj. 
By a symmetric argument, the point P3 is also dominated by a point in 

Since TZV^^q is the convex hull of all such extreme points (i.e., the convex 
hull of the union of extreme points over all appropriate joint distributions), the 
theorem is proved. □ 

Remark 1. Theorem can be extended to the general case of m-encoders. 
Details are provided in Appendix \B[ 



3.3 A stronger converse result for the CEO problem 

As defined, our reproduction sequence X" is restricted to be a product distri- 
bution on X". However, for a blocklength n code, we can allow X" to be any 
probability distribution on A"" and the converse result still holds. In this case, 
we define the sequence distortion as follows: 

- - log f.^^^ , 



^For two encoders, it is easy enough to enumerate the extreme points by inspection. How- 
ever, this can be formahzed by a submodularity argument, which is given in AppendixlB] 
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which is compatible with the original definition when is a product distribu- 
tion. The reader can verify that the result of Lemma [T] is still true for this more 
general distortion alphabet by setting s{x'^\z) = i:"(a;") in the corresponding 
proof. Since Lemma [T] is the key tool in the CEO converse result, this implies 
that the converse holds even if X" is allowed to be any probability distribution 
on A"" (rather than being restricted to the set of product distributions). 

When this stronger converse result is taken together with the achievability 
result, we observe that restricting X" to be a product distribution is in fact 
optimal and can achieve all points in TZ'Dq^q. 



3.4 An Example: Distributed compression of a posterior 
distribution 

Suppose two sensors observe sequences Yj" and Yg" respectively, which are con- 
ditionally independent given a hidden sequence X". The sensors communicate 
with a fusion center through rate-limited links of capacity Ri and i?2 respec- 
tively. Given sequences {Y^ ,¥2) are observed, the sequence X" cannot be 
determined in general, so the fusion center would like to estimate the posterior 
distribution p{x"'\Y-P ,Y^). Since the communication links are rate-limited, the 
fusion center cannot necessarily compute p{x'^\Y^, Y^) exactly. In this case, the 
fusion center would like to generate an estimate pix"'\g^{^\Y{^) , g^\Y2)) that 
should approximate p{x^\Y", F2") in the sense that, on average: 



D 



(p(^"|y^y2")||p(^"l5^"' (2/^,5^^2/2"))) < ne, 



we write 



where, consistent with standard notation (e.g. 18 ), 
Dip{x"\y'^,y^)\\p{x^'\g["\yl'),gi"\y^))) as shorthand for 

The relevant question here is the following. What is the minimum distortion e 
that is attainable given i?i and i?2? 

Considering the CEO problem for this setup, we have: 



= ^D{pix-\y^,y^\x"{x")) + Ih{X^\YI\Y^). 



Identifying p(a;"|gj"^(17),5f '(l^)) ^ l"(a;"), we have: 



( p(:E"i y^" ) P(x" Is^"' (yn , 5^^ (2/2 ) ) ) - "Ed(X" , X" ) - nff (X I n , ^2 ) 



An) 
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BSC(a) 




Encoder 










BSC(a) 


Encoder 





R 



R 



Decoder 



Ed(X",X") < D 



Figure 2: An example CEO problem where X ~ Bernoulli(i), Pr(Fi = X) 
(1 — a), and both encoders are subject to the same rate constraint. 



Thus, finding the minimum possible distortion reduces to an optimization prob- 
lem over TZDq^q. In particular, the minimum attainable distortion e* is given 

by 



e* = inf : (i?i,i?2,i?) G 7^2?^£o} - ^(^l^i. ^2)- 



(7) 



Moreover, the minimum distortion is obtained by estimating each Xj sepa- 
rately. In other words, there exists an optimal (essentially, for large n) estimate 
p*(a;"|-, •) (which is itself a function of optimal encoding functions gl^"\-) and 
ff2^"^(')) that can be expressed as a product distribution 

n 

For this choice of p*{x"\-, •), we have the following relationship: 

-^i?(p(x,|yi,,,y2,,)||;5*(a:,|gi*(")(yr),32^"^(y2))) 

In light of this fact, one can apply Markov's inequality to obtain the following 
estimate on peak component-wise distortion: 



D 



(p(x,|yi,„y2,,)||p; (^.l.9i*^"^(2/r),52^"^(2/2)) ) > c[ < " 



C 



where ^{■) is the counting measure. 

To make this example more concrete, consider the scenario depicted in Figure 
[2j where X ^ Bernoulli(i) and Yi is the result of passing X through a binary 
symmetric channel with crossover probability a for i = 1, 2. To simplify things, 
we constrain the rates of each encoder to be at most R bits per channel use. 

By performing a brute-force search over a fine mesh of conditional distribu- 
tions {p{ui\yi)}f^i, we numerically approximate the set of (i?, D) pairs such that 
{R, R, D) is in the achievable region TZD^^q corresponding to the network in 
Figure [2] The lower convex envelope of these (R, D) pairs is plotted in Figure |3] 
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qI , , , , , , , , , , 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

R 

Figure 3: The distortion-rate function of the network in Figure [2] computed for 
a e {0.01,0.1,0.25}. 

for a £ {0.01, 0.1, 0.25}. Continuing our example above for this concrete choice 
of source parameters, we compute the minimum achievable Kullback-Leibler 
distance e* according to ([7| . The result is given in Figure [i] 

These numerical results are intuitively satisfying in the sense that, if Yi,F2 
are high-quality estimates of X (e.g., a = 0.01), then a small increase in the 
allowable rate R results in a large relative improvement of p{x\-, •), the decoder's 
estimate of p{x\Yi,Y2). On the other hand, if Yi,l2 are poor-quality estimates 
of X (e.g., a = 0.25), then we require a large increase in the allowable rate R 
in order to obtain an appreciable improvement oi p{x\-, ■). 

One field where this example is directly applicable is machine learning. In 
this case, Xj could represent the class of object j, and Yij , ^2,^ are observable 
attributes. In machine learning, one typically estimates the probability that an 
object belongs to a particular class given a set of observable attributes. For this 
type of estimation problem, relative entropy is a natural penalty criterion. 

Another application is to horse-racing with conditionally independent, rate- 
limited side informations. In this case, the doubling rate of the gambler's wealth 
can be expressed in terms of the logarithmic loss distortion measure. This 
example is consistent with the original interpretation of the CEO problem, where 
the CEO makes consecutive business decisions (investments) having outcomes 
X", with the objective of maximizing the wealth of the company. We omit the 
details. 
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0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

R 

Figure 4: The minimum achievable KuUback-Leibler distance computed accord- 
ing to Q, i.e., the curves here are those of Figure [sj lowered by the constant 
H{X\Y,,Y2). 

3.5 An Example: Joint estimation of the encoder obser- 
vations 

Suppose one wishes to estimate the encoder observations {Yi,Y2). In this case, 
the rate region simplifies considerably. In particular, if we tolerate a distortion 
D in our estimate of the pair {Yi,Y2), then the achievable rate region is the 
same as the Slepian-Wolf rate region with each rate constraint relaxed by D 
bits. Formally: 

Theorem 4. If X — {Yi,Y2), then TZVq^q consists of all vectors {Ri, R2, D) 
satisfying 

Ri>H{Yi\Y2)-D 
R2>H{Y2\Y,)-D 
Ri+R2> H{Yi,Y2)-D 
D>0. 

Proof. First, note that Theorem [s] implies that TZV^^q is equivalent to the 
the union of {Ri, R2, D) triples satisfying ([T]) taken over all joint distributions 
Piq)pix,yi,y2)piui\yi,q)p{u2\y2,q)- Now, since X = {Yi,Y2), each of the in- 
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equalities ([T]) can be lower bounded as follows: 

Ri > IiYi;Ui\Yi,Y2,Q) + HiYi,Y2\U2,Q) - D 

= H{Y2\U2,Q) + H{Yi\Y2) - D 

>H{Y,\Y2)-D 
R2 > nY2;U2\Yi,Y2,Q) + H{Yi,Y2\UuQ) - D 

^ H{Yi\Ui,Q) + H{Y2\Yi) - D 

>H{Y2\Yi)-D 

Ri+R2>nUi;Y,\Y,,Y2,Q)+I{U2;Y2\Y,,Y2,Q)+H{Y,,Y2)-D 
= H{YuY2)-D 
D>H{Yi,Y2\Ui,U2,Q) 
> 0. 

Finally, observe that by setting Ui — Yi for i = 1,2, we can achieve any point 
in this relaxed region (again, a consequence of Theorem |3|. □ 



We remark that this result was first proved in 10 by Courtade and Wesel 
using a different method. 

4 Multiterminal Source Coding 

With Theorem|3]in hand, we are now in a position to characterize the achievable 
rate distortion region TZD for the multiterminal source coding problem under 
logarithmic loss. As before, we prove an inner bound first. 

4.1 Inner Bound 

Definition 5. Let {Ri, R2, Di, D2) G TZV^ if and only if there exists a joint 
distribution of the form 

p{yi,y2)p{ui\yi, q)p{u2\y2, q)p{q) 
where \Ui\ < \yi\, \U2\ < |3^2|, CLnd \ Q\ < 5, which satisfies 

Ri>I{Yi;Ui\U2,Q) 
R2>I{Y2;U2\Ui,Q) 
Ri+R2>I(UuU2;Y,,Y2\Q) 
Di>H{Yi\Ui,U2,Q) 
D2>H{Y2\Ui,U2,Q). 



Theorem 5. TZV^ C TZV . That is, all rate distortion vectors in TZT)^ are 
achievable. 

Again, we require an appropriate version of the Berger-Tung inner bound: 
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Proposition 2 (Berger-Tung Inner Bound [14|[15] ). The rate distortion vector 
(Ri, R2, Di, D2) is achievable if 

Ri>IiUi;Yi\U2,Q) 
R2>HU2;Y2\U,,Q) 
Ri + R2>I{Ui,U2;Yi,Y2\Q) 

Di>E[d{Yi,fi{Ui,U2,Q)] 
D2>E[d{Y2,f2{Ui,U2,Q)]. 

for a joint distribution 

p{yi,y2)p{ui\yi, q)p(u2\y2, q)p{q) 

and reproduction functions 

f,:UixU2X Q^y,, for I = 1,2. 

Proof of Theorem^ To prove the theorem, we simply apply Proposition [2] with 
the reproduction functions fi{Ui, U2,Q) := Pr [Yi = yi\Ui,U2, Q]- □ 

Hence, we again see that our inner bound TZT)^ C TZD is nothing more 
than the Berger-Tung inner bound specialized to the setting when distortion is 
measured under logarithmic loss. 

4.2 A Matching Outer Bound 

The main resuh of this paper is the foUowing theorem. 
Theorem 6. UV = WD* . 

Proof. As before, we note that the cardinahty bounds on the alphabets in the 
definition of TZV^ can be imposed without any loss of generality. This is dis- 
cussed in detail in Appendix [K\ 

Assume {Ri, R2, Di, D2) is strict-sense achievable. Observe that proving 
that {Ri,R2,Di,D2) & TZV' wiU prove the theorem, since C WD* and 

TZV is closed by definition. 

For convenience, define V{Ri,R2) to be the set of joint distributions of the 
form 

p{yi,y2)p{ui\yi, q)p{u2\y2, q)p{q) 
with \Ui\ < \yi\, \U2\ < \y2\, and \Q\ < 4 satisfying 

Ri>I{Ui;Yi\U2,Q) 
R2>I{U2;Y2\UuQ) 
Ri + R2>I{Ui,U2;Y,,Y2\Q). 
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We remark that 7-*(i?i, i?2) is compact. We also note that it will suffice to show 
the existence of a joint distribution in V{Ri,R2) satisfying H{Yi\Ui,U2,Q) < 
Di and H{Y2\Ui,U2,Q) < D2 to prove that (Ri, R2, Di, D2) G TlV. 
With foresight, consider random variable X defined as follows 

Y _ j (^ij 1) with probability t , , 

{ iY2,2) with probabifity 1 - t. 

In other words, X = (Yb, B), where i? is a Bernoulli random variable indepen- 
dent of Yi, ^2- Observe that Yi X -(r^ Y2 form a Markov chain, and thus, we 
are able to apply Theorem |3] 

Since {Ri, R2, Di, D2) is strict-sense achievable, the decoder can construct 
reproductions F",!^" satisfying 



1 " 

- J2 MY^.jAj) < A for 1 = 1,2. 



n 



Fix the encoding operations and set Xj ((yi, 1)) = tYij{yi) and Xj ((1/2, 2)) 
(1 - t)y2,j (y2)- Then for the CEO problem defined by (X, Fi, F2): 



*^Elog 




,tYiAYi,j)J " U \{l-t)Y2AY2,j), 
f " \ - f " 

h2{t) + J, ^^1 j) + ^d{Y2.j,Y2A 



n 

< h2{t)+tDi + {l-t)D2 

where h2{t) is the binary entropy function. Hence, for this CEO problem, 
distortion h2{t) + tDi + (1 — t)D2 is achievable and Theorem [s] yields a joint 
distributiorj^ Ff G V{Ri,R2) satisfying 

h2{t)+tDi + (1 -t)Z?2 > H{X\u['\u^*\q^'^) 

= h2{t) +tH{Yi\u['\u!i\Q^''>) 
+ {l-t)H{Y2\u['\u^'\Q^'^), 

where the second equality follows by by definition of X in ([s]) . For convenience, 
define iJi(Pt) ^ H{Yi\u['\u^'\q('^) and H2iPt) ^ HiY2\u['\u^'\Q^'^). 
Note the following two facts: 

1. By continuity of entropy, the functions and i?2(') are continuous on 

the compact domain 'P{Ri,R2)- 



^Henceforth, wo use the superscript (i) to explicitly denote the dependence of the auxiliary 
random variables on the distribution parametrized by t. 
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2. The above argument proves the existence of a function ip : [0, 1] — 
V{Ri,R2) which satisfies 



tHi{(f{t)) + (1 - t)H2{(pit)) < tDi + (1 - t)D2 for all t € [0, 1]. 

These two facts satisfy the requirements of Lemma [t] (see Appendix [Pj), and 
hence there exists Pt^ G V{Ri,R2), Pt^ G 'P{Ri,R2), and 9 e [0, 1] for which 

0HiiPt^) + {I - 9)Hi{Pt,) < Di 
eH2{Pu) + {I - e)H2{Pt,) < D2. 

Timesharing between distributions Pt^ and Pt^ with probabilities 9 and (1 — 
6), respectively, yields a distribution P* G 'P{Ri,R2) which satisfies Hi{P*) < 
Di and H2{P*) < D2- This proves the theorem. □ 



4.3 A stronger converse 

For the CEO problem, we are able to obtain a stronger converse result as dis- 
cussed in Section 13.31 We can obtain a similar result for the multiterminal 
source coding problem. Indeed, the converse result we just proved continues to 
hold even when 1^" is allowed to be any probability measure on yf, rather than 
a product distribution. The proof of this fact is somewhat involved and can be 
found in Appendix [E} 

We note that the proof of this strengthened converse result (i.e., Theorem 
13 in Appendix |e| offers a direct proof of the converse of Theorem |6j and 



as such we do not require a CEO resuh (Theorem |3| or a "black box" tuning 
argument (Lemnia[7|. At the heart of this alternative proof lies the Csiszar sum 
identity (and a careful choice of auxiliary random variables) which provides a 
coupling between the attainable distortions for each source. In the original proof 
of Theorem |6j this coupling is accomplished by the tuning argument through 
Lemma 

Interestingly, the two proofs are similar in spirit, with the key differences be- 
ing the use of the Csiszar sum identity versus the tuning argument. Intuitively, 
the original tuning argument allows a "clumsier" choice of auxiliary random 
variables which leads to a more elegant and transparent proof, but appears 
incapable of establishing the strengthened converse. On the other hand, apply- 
ing the Csiszar sum identity requires a very careful choice of auxiliary random 
variables which, in turn, affords a finer degree of control over various quantities. 



4.4 An Example: The Daily Double 

The Daily Double is a single bet that links together wagers on the winners of 
two consecutive horse races. Winning the Daily Double is dependent on both 
wagers winning together. In general, the outcomes of two consecutive races can 

■^The timesharing scheme can be embedded in the timesharing variable Q, increasing the 
cardinaUty of Q by a factor of two. 
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be correlated (e.g. due to track conditions), so a gambler can potentially use 
this information to maximize his expected winnings. Let yi and 3^2 be the set 
of horses running in the first and second races respectively. If horses yi and 
2/2 win their respective races, then the payoff is 0(2/1,7/2) dollars for each dollar 
invested in outcome {Yi,Y2) = (2/1,2/2)- 

There are two betting strategies one can follow: 

1. The gambler can wager a fraction 61(2/1) of his wealth on horse 2/1 winning 
the first race and parlay his winnings by betting a fraction 62(2/2) of his 
wealth on horse 2/2 winning the second race. In this case, the gambler's 
wealth relative is 6i(Yi)62(i^2)o(Yi, I2) upon learning the outcome of the 
Daily Double. We refer to this betting strategy as the product-wager. 

2. The gambler can wager a fraction 6(2/1, 2/2) of his wealth on horses (2/1, 2/2) 
winning the first and second races, respectively. In this case, the gambler's 
wealth relative is 6(Yi, 12)0(^1, ^2) upon learning the outcome of the Daily 
Double. We refer to this betting strategy as the joint-wager. 

Clearly the joint-wager includes the product-wager as a special case. However, 
the product-wager requires less effort to place, so the question is: how do the 
two betting strategies compare? 

To make things interesting, suppose the gamblers have access to rate-limited 
information about the first and second race outcomes at rates Ri , R2 respec- 
tively. Further, assume that Ri < H{Yi), R2 < ^(^2), and Ri -\- R2 < 
H{Yi,Y2). For (i?i,i?2) and ^(2/1,2/2) given, let V{Ri,R2) denote the set of 
joint pmf 's of the form 

p{q, yi, 2/2, "1, U2) = p{q)p{yi , 2/2)^(^1 12/1, g)p(ui I2/1 , q) 

which satisfy 

Ri>I{Yi;Ui\U2,Q) 
R2>I{Y2;U2\Ui,Q) 
Ri+R2>I(Yi,Y2;Ui,U2\Q) 

for alphabets Ui,U2, Q satisfying \Ui\ < |3^i| and |Q| < 5. 

Typically, the quality of a bet is measured by the associated doubling rate 
(cf. [is] ) . Theorem [6] implies that the optimal doubling rate for the product- 
wager is given by: 

Wp.wip{yi,y2)) = Myi,y2)log6;f(2/i)6;(2/2)o(2/i,2/2) 
yi.y2 

^E\ogo{Y,,Y2)- inf {H{Yi\Ui,U2,Q) + H{Y2\Ui,U2,Q)}. 
Likewise, Theorem [4] implies that the optimal doubling rate for the joint-wager 
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is given by: 

Wy^{p{yi,y2)) ^ P(yi'2/2)log6*(2/i,?;2)o(yi,?/2) 

VI, y2 

= E\ogo{Y,,Y2)+mm{R,-H{Y,\Y2),R2 - H{Y2\Y,), 

Ri+R2-H{Yi,Y2)}. 

It is important to note that we do not require the side informations to be 
the same for each type of wager, rather, the side informations are only pro- 
vided at the same rates. Thus, the gambler placing the joint-wager receives 
side information at rates (i?i,i?2) that maximizes his doubling rate, while the 
gambler placing the product-wager receives (potentially different) side informa- 
tion at rates (_Ri,i?2) that maximizes his doubling rate. However, as we will 
see shortly, for any rates R2), there always exists rate-limited side informa- 
tion which simultaneously allows each type of gambler to attain their maximum 
doubling rate. 

By combining the expressions for Wp_^{p{yi,y2)) and 2/2)), we 

find that the difference in doubling rates is given by: 

A(i?i,i?2) - w-*_Jp{yuy2)) - w*.Myi,y2)) 

= min - H{Yi\Y2),R2 - H{Y2\Yi), Ri + R2 - F(yi, F2)} 

+ inf {H{Yi\Ui,U2,Q) + H{Y2\Ui,U2,Q)} (9) 

peV{Ri,R2) 

= inf mm\Ri-I(Yi;Ui\U2,Q)+IiYi;Y2)-IiYi;U2,Q)+H{Y2\Ui,U2,Q), 

R2 - I{Y2; C/2IC/1, Q) + 1(^2; - /(Fa; C/i, Q) + H{Yi\Ui, C/2, Q), 

Ri + R2^ I{YuY2; C/i, [/2IQ) + /(Fi; r2|[/i, f/2, Q)} 

= inf I{Yi;Y2\Ui,U2,Q). (10) 
periRi,R2) 



The final equality ( 10 ) follows since 

• Ri > I{Yi;Ui\U2, Q) and R2 > 1(^2; U2\Ui,Q) for any peV{Ri,R2). 

• I{Y2;Yi) > I(Y2;Ui,Q) and /(ri;l2) > I{Yi;U2,Q) for any p € 
V{Ri,R2) by the data processing inequality. 

• The infimum in ^ is attained by a p G P(i?i,i?2) satisfying Ri + R2 = 
I{Yi,Y2; Ui, U2\Q)- See Lemma 10 in Appendix [F| for details. 

• By definition of conditional mutual information, 

HiY,\Ui,U2,Q) > IiYi;Y2\UuU2,Q) 

for i = 1,2. 
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Let p* € i?2) be the distribution that attains the infimum in ^ (such 

a p* always exists) , then (IIo|) yields 



W(_^{p{yi,y2))-W^Myi,y2)) 



sr *f \sr *t I M p iyi,y2\ui,u2,q) 

> p (ui,U2,g > P {yi,y2\ui,U2,q)\og--— TTTH ^ 



p*{ui,U2,q) 2^ p*(yi,y2|ui,M2,<?)lo, 
: Ep. log o(yi , Y2)p* (Fi , 12 1 f/i , (72 , Q) 



- Ep- log o{Yi , Y2 )p* {Yi\Ui,U2,Q)p*iY2\Ui,U2,Q). 

Hence, we can interpret the auxiliary random variables corresponding to p* as 
optimal rate-limited side informations for both betting strategies. Moreover, 
optimal bets for each strategy are given by 

1. b*{yi,y2) ^ p*{yi,y2\ui,U2,q) for the joint- wager, and 

2. bliyi) = p*{yi\ui,U2,q), &2(y2) = p* (?/2|ui, "2, g) for the product- wager. 

Since V{Ri,R2) C V{R[,R'2) for i?i < R[ and i?2 < R'2, the function 
A(i?i, R2) is nonincreasing in i?i and i?2. Thus, the benefits of using the joint- 
wager over the product-wager diminish in the amount of side-information avail- 
able. It is also not difficult to show that A(i?i, i?2) is jointly convex in i?2)- 

Furthermore, for rate-pairs {Ri,R2) and {R'^.R^) satisfying Ri < R[ and 
R2 < R'2, there exist corresponding optimal joint- and product-wagers b* {yi, 1/2) 
and 61(2/1)62(2/2), and b* (2/1,^2) and bl (2/1)62' (2/2), respectively, satisfying 

i?(6*'(2/i,2/2)||6^'(2/i)6;'(2/2)) (2/1,2/2) || 6^(2/1)6; (2/2)). (11) 

So, roughly speaking, the joint-wager and product-wager look "more alike" as 
the amount of side information is increased. The proof of the strict inequality 
in (11 1 can be inferred from the proof of Lemma 10 in Appendix [F| 



To conclude this example, we note that A(i?i,i?2) enjoys a great deal of 
symmetry near the origin in the sense that side information from either encoder 
contributes approximately the same amount to the improvement of the product- 
wager. We state this formally as a theorem: 

Theorem 7. Define Pm{Yi,Y2) to be the Hirschfeld-Gebelein-Renyi maximal 
correlation between random variables Yi andY2. Then, A(i?i,i?2) ^ 1 0^11^2) ~ 
p'^(Yi,Y2) ■ (i?i -t-i?2)- Moreover, this bound is tight as (i?i,i?2) — >■ (0,0). 

Proof. If i?2 — 0, then it is readily verified that A(i?i,0) can be expressed as 
follows: 

A(i?i,0) = /(yi;y2)- max I{Ui;Y2). 

p{ui\yi):IiUi;Yi)=Ri, 
Ui^Yi^Y2, |Wi|<|J^i| + l 



By symmetry: 



A(0,i?2) = /(Fi;1^2)- max I{U2;Yi) 

p{u2\y2):IiU2;Y2)=R2. 

U2^Y2^Yi, |W2|<|>'2|+1 
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Here, we can apply a result of Erkip 19 Theorem 10] to evaluate the gradient 
of A(i?i,i?2) at (-Ri,i?2) = (0,0): 



d 



A(i?i,i?2) 



(i?l,fl2) = (0,0) 



d 



A(i?i,i?2) 



-PUYUY2). 



(fll,i?2) = (0,0) 



(12) 



Note, smee A(i?i,0) and A(0,i?2) are each convex in their respective variable 
and A(0, 0) = /(Yi; ^2), we have 



A(i?i, 0) > /(n; Fa) - ^^(^1,^2)7?! 
A(0,i?2) > I{Yv,Y2) - pUYi,Y2)R2. 



(13) 



Taking this one step further, for i^i,t^2 > 0, we can evaluate the one-sided 
derivative: 



lim 
A4.0 



A(A;/i,A;y2)- A(0,0) 
A 



~pIAYi,Y2)-{v^+V2). 



(14) 



We remark that ( 14 ) does not follow immediately from ( 12 ) since the point at 
which we are taking the derivatives (i.e., the origin) does not lie in an open 
neighborhood of the domain. Nonetheless, the expected result holds. 

Since A(_Ri.i?2) is convex, we obtain an upper bound on the one-sided 
derivative as follows: 

A(Ai.i,Az/2)-A(0,0) iA(2Az/i,0) + iA(0,2Ai/2)- A(0,0;p) 

hm < lim 

A a;o a 



A^O 

1 . A(A2i.i,0)-A(0,0) 
- lini 

2 Aio A 

1 A(0,A2i/2)-A(0,0) 

2 A^o A 

-pUYi.Y2)- {1^1+1^2), 



where the final equality follows by ( 12 ) and the positive homogeneity of the 
directional derivative. 

Therefore, to complete the proof of ( 14 ) , it suffices to prove the lower bound 

^. A(Ai/i,Ai.2)- A(0,0) 



lim 
A4.0 



A 



>-pi,{Y,,Y2)-{v^ + V2). 
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To this end, fix A, I'l, > and observe that 

A(Az/i,A:/2)- A(0,0) 
A 

= Y inf \l{Y,;Y2\UuU2\Q)-I{Y,;Y2)} (15) 

A peV{\vi,\y2) ^ J 

ii^f {l{Yi,Y2;Ui,U2\Q)~I{Yi-Ui,U2\Q)-I{Y2;UuU2\Q)] 



(i^i + J^2) - ^ (/p. (n; C/i, C/2IQ) + V {Y2;Uu U2\Q)) 
{1^1 + 1^2) -\ (ip- (n; f/i|C/2, Q) + V (n; (^slQ) 

+ ip' (i"2 ; f/2 1 c/i , Q) + /p- (i"2 ; c/i 1 0) 



(16) 



> (j^i + J/2)-p^,(ri, Fa) (2/^1 +2^.2) 
{l-pUY^,Y2)) 



A 



(/p. (Fi ; C/i I C/2 , Q) + /p- (12 ; [/2 1 (71 , Q) ) 



+ + (1 - pI{Y,,Y2)) (Vr + J.2) 

(l-p2^(Yi,r2)) 



A 



(/p. (Fi ; C/i I C/2 , 0) + /p- (1^2 ; C/2 1 C/i , Q)) 



>-p2^(yi,y2)(i^i + i^2). 

In the above string of inequalities 



(17) 



(18) 



(15) follows by definition of A(i?i,i?2)- 



Equality (16) follows since Lemma [10| guarantees that the infimum is at- 
tained in (15) for some -p* G ■p(Ai'i, Ai^2) satisfying /p. (Yi, I2; U\,U2\Q) = 
\{vx + V2). Here, we write /p* (Yi, F2; C^i, C^2|Q) to denote the mutual 
information 1{Y\^Y2 \ Ui, U2\Q) evaluated for the distribution p* . 



• To see that (17) holds, note that 

Ip' (^2 ; C/2 1 Q) = Xiyi + \y2 - Ip- ( Yi ; f/i | f/2 , 0) , 

and thus 

IiY,;Y2) - p^(Fi,r2) (Az^i + Ai.2 - Ip'iYi;U,\U2,Q)) 

< A(0,Ai/i + Ai/a - Ip'iYi;Ui\U2,Q)) (19) 
= I{Yi;Y2)~ max HU2;Yi) (20) 

p{u2\V2):I{Y2;U2)<\'^i + \'y2~I^-'{Yi;Ui\U2,Q), 
U2^Y2^Yi 



<IiY,;Y2)-Ip*{Yi;U2\Q), 
which implies 

-P™ ( Yi , ^2 ) ( Ai^i + Ai.2 -Ip'iYi;Ui\U2,Q)) < -Ip- ( Yi ; C/2 1 Q) ■ 
The above steps are justified as follows: 



(21) 
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( 19 ) follows from ( 13 ) 



— (20) follows by definition of the function A(0,x). 



— ( 21 ) follows since Q is independent of Yi , I2 (by definition of p* ) , and 



thus U2 — {U2, Q) lies in the set over which we take the maximum in 
pOl). 



By symmetry, we conclude that 

- iIp'{Y,;U2\Q) + Ip'{Y2;Ui\Q)) 

> ~pI{Yi,Y2) (2Ai^i + 2X1^2 - Ip' (Yi; Ui\U2,Q)- Ip, (^2; U2\Ui,Q)), 



and (17) follows. 



• ([18]) follows since At^i > V (Yi; C/i |f72, Q) and Ai^i > V (^2; C/2IC/1, Q) for 
P* e 7'(Ai/i,Ai/2). 

□ 



4.5 An Application: List Decoding 

In the previous example, we did not take advantage of the stronger converse 
result which we proved in Appendix [e| (see the discussion in Section 4.3). In 
this section, we give an application that requires this strengthened result. 
Formally, a 2-list code (of blocklength n consists) of encoding functions: 



for i = 1,2 



and list decoding functions 

: {1, . . . , m|")} X {1, . . . , ^ 2^" 

4"):{i,...,m(")}x{i,...,m(")}^2^?. 

A list decoding tuple i?2, Ai, A2) is achievable if, for any e > 0, there exists 
a 2-list code of blocklength n satisfying the rate constraints 



1 

n 
1 



logAf}"^ < i?i +e 
logM2^"^ <i?2 + e, 



and the probability of list-decoding error constraints 



Pr 
Pr 



Y 



-^4") (g(")(y«),5(")(r« 
{9["\Yn,gi''\Y2" 



< e. 
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with list sizes 

-log|4")| < Ai+e 
n 

-log|4")| < A2 + e. 
n 

With a 2-hst code so defined, the following theorem shows that the 2-list decod- 
ing problem and multiterminal source coding problem under logarithmic loss 
are equivalent (inasmuch as the achievable regions are identical): 

Theorem 8. The list decoding tuple (_Ri, i?2, Ai, A2) is achievable if and only 
if 

Ri>I{Ui;Yi\U2,Q) 
R2>I(U2;Y2\Ui,Q) 
Ri+R2>I{Ui,U2;Yi,Y2\Q) 
Ai>H{Yi\Ui,U2,Q) 
A2>H{Y2\Ui,U2,Q). 

for some joint distribution 

p{yi,y2,ui,U2,q) ^ p{yi ,y2)p(ui\yi, q)p{u2 \ y2 , q)p{q) , 
where < |3^i|, IW2I < iJ^al, and \Q\ < 5. 

Remark 2. We note that a similar connection to list decoding can be made for 
other multiterminal scenarios, in particular the CEO problem. 

To prove the theorem, we require a slightly modified version of [20 j Lemma 

1]: 

Lemma 2. If the list decoding tuple i?2i Ai, A2) is achieved by a sequence 
of 2-list codes (72"'*' -^i"^ -^2"'}n->co; then 



H{Yi\9t'\yi),9t\Y^)) < |il"Vne„ 
H{Y2^\9[''\yr),9i''\Y^)) < \L^2'^\+ne.n, 
where e„ — )■ as n 00 . 

Proof. The proof is virtually identical to that of ^20j Lemma 1] , and is therefore 
omitted. □ 

Proof of Theorem^ First observe that the direct part is trivial. Indeed, for 
ajoint distribution 2/2, ui,U2,g) ^ p{yi,y2)p{ui\yi,q)p{u2\y2,q)p{q), apply 

(n) 

the Berger-Tung achievability scheme and take Ll to be the set of sequences 
which are jointly typical with the decoded quantizations (f/f , f/l*). This set has 
cardinality no larger than 2"(^(^'l^i''^2,Q)+«)^ which proves achievability. 
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To see the converse, note that setting 



= Pr 



achieves a logarithmic loss of lH{Y^^\g^^\Y{'), g':^\Y^)) for source i in the 
setting where reproductions are not restricted to product distributions. Apply- 
ing the strengthened converse of Theorem [6] together with Lemma [2] yields the 
desired result. □ 



5 Relationship to the General Multiterminal 
Source Coding Problem 

In this section, we relate our results for logarithmic loss to multiterminal source 
coding problems with arbitrary distortion measures and reproduction alphabets. 

As before, we let {^ij, i^2.j}J=i be a sequence of n independent, identically 
distributed random variables with finite alphabets and 3^2 1 respectively, and 
joint pmf p(yi, 2/2)- 

In this section, the reproduction alphabets i = 1,2, are arbitrary. We 
also consider generic distortion measures: 

d^ ■.y^^y^^M^ for i = 1,2, 

where denotes the set of nonnegative real numbers. The sequence distortion 
is then defined as follows: 

1 " . 

We will continue to let •) and 3^i, 3^2 denote the logarithmic loss distortion 
measure and the associated reproduction alphabets, respectively. 

A rate distortion code (of blocklength n) consists of encoding functions: 

forz = l,2 

and decoding functions 

^i") : { 1, . . . , m(") } X {1, . . . , Af(") } ^ for z = 1, 2. 

A rate distortion vector (Ri, R2, Di, D2) is strict-sense achievable if there 
exists a blocklength n, encoding functions gi^\g^'' and a decoder ij)^^ ) 

such that 

i?, > ^ log M^"' for i = 1, 2 (22) 
A > Ed,{Y^",Yr) for i = 1, 2. (23) 
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Where 

^" = i^t\9'r\yrv9^2\y2)) for 1 = 1,2. 

For these functions, we define the quantity 

:-^EElog( ^ 2-*(''-^*.^)) fori = l,2. (24) 



n 

j=i \yieyi 

Now, let l3i{Ri,R2,Di,D2) be the infimum of the A V^^"', f/i^"^) 's, 

where the infimum is taken over all codes that achieve the rate distortion vector 

(i?4,i?2,i?l,2?2). 

At this point it is instructive to pause and consider some examples. 

Example 1 (Binary Sources and Hamming Distortion). For i = 1,2, let yi = 
yi = {0, 1} and let di be the a-scaled Hamming distortion measure: 



di{yi,Vi) 

In this case, 



ifyi = yi, 
OL ifVii^Vi- 



2-Uyi,y^.^) = 2° + 2-«, (25) 

so l3^{Ri,R2,Di,D2) = log(l + 2-") for any {Ri,R2,Di,D2). This notion that 
j3i{R\,R2, £>i, D2) is a constant extends to all distortion measures for which the 
columns of the \yi\ x \yi\ distortion matrix are permutations of one another. 

Example 2 (Binary Sources and Erasure Distortion). For i = 1,2, let y^ = 
{0, 1}, yi = {0, l,e} and let di be the standard erasure distortion measure: 

if Vi = yi 
di{yi,yi) = { 1 ifyi = e 



00 ifpiG {0, 1} and ^ yi. 



In this case. 



V 2-^"^(^-^'.^) = 1 2-°" + 2° = 1 ^fY,, e {0, 1} 
.ife, I 2-1+2-1 = 1 e. ^ ^ 



so /3i{Ri, R2, Di, D2) = for any {Ri, R2, Di, D2) ■ This result can easily be 
extended to erasure distortion on larger alphabets by setting the penalty to log |3^i| 
when Yi = e. 
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Theorem 9. Suppose {Ri, R2, Di, D2) is strict-sense achievable for the general 
multiterminal source coding problem. Then 



Ri >I{Ui;Y,\U2,Q) 

R2 >IiU2;Y2\Ui,Q) 

R1+R2 >HUi,U2;Yi,Y2\Q) 

Di > H(Yi|C/i, (72, Q) - PiiRi,R2,Di,D2) 

D2 > HiY2\Ui,U2,Q) - P2iRl,R2,Di,D2) J 



(27) 



for some joint distribution p{yi,y2)p{q)p{ui\yi,q)p{u2\y2i q) with \Ui\ < |3^j| and 
|Q|<5. 

Proof. Since {Ri, R2, Di, D2) is strict-sense achievable, there exists a block- 
length n, encoding functions g["'^ , g^^ and a decoder {ij'^i^^ , -ip^^ ) satisfying ( 22 )- 



(23). Given these functions, the decoder can generate reproductions Y{^,Y.p 



satisfying the average distortion constraints (23). From the reproduction Y" 
we construct the reproduction F^" as follows: 

YJyA = . , — . 

Now, using the logarithmic loss distortion measure, observe that satisfies 

Ed{Yr,Yr) = ^^Elog(2*(^'-^-)) + ^J^Elog ^ 2-'^M,^.) 
1 " 



< D, 

■■=D^■ 

Thus, i?2, Di,D2) is achievable for the multiterminal source coding problem 
with the logarithmic loss distortion measure. Applying Theorem [6] and taking 
the infimum over all coding schemes that achieve {Ri, R2, Di, D2) proves the 
theorem. □ 

This outer bound is interesting because the region is defined over the same 
set of probability distributions that define the Berger-Tung inner bound. While 
the /3i's can be difficult to compute in general, we have shown that they can be 
readily determined for many popular distortion measures. As an application, we 
now give a quantitative approximation of the rate distortion region for binary 
sources subject to Hamming distortion constraints. Before proceeding, we prove 
the following lemma. 
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Lemma 3. Suppose {Ri, R2, Di, D2) is strict-sense achievable for the multiter- 
minal source coding problem with binary sources and equal to the ai-scaled 
Hamming distortion measure, for i — 1,2. Then the Berger-Tung achievability 
scheme can achieve a point [Ri, R2, Di, D2) satisfying 

A: - A < f ^ - 1) i^^ + log(l + 2-"0 



for some A G [0, 1], i = 1, 2. 

Proof. By Theorem § (A,i?2,A, A) satisfy ^ for some joint distribution 
piyiTy2)p{Q)p{ui\yi, q)p{u2\y2, q)- For this distribution, define the reproduction 
functions 

Y^iUi, U2,Q) = argmaxp(y,| A, A, Q) for i = 1,2. (28) 

Vi 

Then, observe that for i = 1,2: 
Edi(Vi,Yi) = ^ p[ui,U2,q) 

ui,U2,q 

= ai p{ui,U2,q) ■ mmp{yi\ui,U2,q) 

<y E pK,w2,<?)-i?m|A,A,Q = ui,M2,g) (29) 

«i,"2.g 

= ^i7(r,|A,A,g). 



■ uimp{yi\ui,U2,q) + • maxp(?/i|Mi, U2, 9) 

Vi Vi 



Where ([29| follows from the fact that 2p < /i2(p) for < p < 0.5. Thus, 
A = ^H{Yi\Ui,U2,Q) is achievable for rates (i?i,i?2) using the Berger-Tung 
achievability scheme. Combining this with the fact that A > H{Yi\Ui,U2,Q) ~ 
log(l + 2~"'), we see that 

A- A < yi?(r.|A,A,g)-i^(F.|A,A,Q) + log(l + 2-"')- 

□ 

Lemma |3] allows us to give a quantitative outer bound on the achievable rate 
distortion region in terms of the Berger-Tung inner bound. 

Corollary 1. Suppose R2, ^2^'*) is strict-sense achievable for the mul- 
titerminal source coding problem with binary sources and di equal to the stan- 
dard l-scaled Hamming distortion measure, for i — 1,2. Then the Berger-Tung 
achievability scheme can achieve a point (Ri, R2, d\^\ D^2^) , where 



ZjW _ ^(1) < ^ log 5 j ^ Q ^Q^ i = 1, 2. 
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Proof. For rates (i?i, i?2), note that distortions (Di, D2) are strict-sense achiev- 
able for the Qfi-scaled Hamming distortion measures if and only if distortions 
{d\^\D2^^) = {^Di, -^02) are strict-sense achievable for the 1-scaled Ham- 
ming distortion measure. Likewise, the point {Ri, R2, Di, D2) is achieved by the 
Berger-Tung coding scheme for the cti-scaled Hamming distortion measures if 
and only if i?2, '^Di, ^£'2) is achieved by the Berger-Tung coding scheme 
for the 1-scaled Hamming distortion measure. 

Thus, applying Lemma [3j we can use the Berger-Tung achievability scheme 
to achieve a point {Ri, R2, d'^\ D^^) satisfying 



a, 



1 



<l(f -l)i7, + llog(l + 2--) 

;^ - - V. + - log(l + 2-"0 (30) 
2 ai J ai 



for some Hi S [0, 1]. We can optimize (30) over ai to find the minimum gap 
for a given Hi. Maximizing over Hi e [0,1] then gives the worst-case gap. 
Straightforward calculus yields the saddle-point: 

max inf I ( i - — + — log(l + 2-"') 



Hie[OA]ai>o [\2 ai J a. 

= inf max 1(1- —] Hi + — log(l 

ai>OH,e[Q.i] [\2 at J ai 

= i log (J) < 0.161, 

which is achieved for ai = 2 and any H E [0,1]. □ 

Remark 3. We note briefly that this estimate can potentially be improved if 
one knows more about the source distribution. 



6 Concluding Remarks 

One immediate direction for further work would be to extend our results to 
more than two encoders. For the CEO problem, our results can be extended to 
an arbitrary number of encoders. This extension is proved in Appendix [B| 

On the other hand, generalizing the results for the two-encoder source coding 
problem with distortion constraints on Yi and Y2 poses a significant challenge. 
The obvious point of difficulty in the proof is extending the interpolation ar- 
gument to higher dimensions so that it yields a distribution with the desired 
properties. In fact, a "quick-fix" to the interpolation argument alone would not 
be sufficient since this would imply that the Berger-Tung inner bound is tight for 
more than two encoders. This is known to be false (even for the logarithmic loss 
distortion measure) since the Berger-Tung achievability scheme is not optimal 
for the lossless modulo-sum problem studied by Korner and Marton in pi]. 
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A Cardinality Bounds on Auxiliary Random 
Variables 

In order to obtain tight cardinality bounds on the auxiliary random variables 
used throughout this paper, we refer to a recent result by Jana. In [Tt], the 
author carefully applies the Caratheodory-Fenchel-Eggleston theorem in order 
to obtain tight cardinality bounds on the auxiliary random variables in the 
Berger-Tung inner bound. This result extends the results and techniques em- 



ployed by Gu and Effros for the Wyner-Ahlswede-Korner problem 22 , and by 



Gu, Jana, and Effros for the Wyner-Ziv problem 23 . We now state Jana's 
result, appropriately modified for our purposes: 

Consider an arbitrary joint distribution p{v, yi, . . . , with random vari- 
ables V,Yi, . . . ,Ym coming from alphabets V, , . . . , y,n respectively. 

Let di : V X Vi 1 < I < L he arbitrary distortion measures defined for 

possibly different reproduction alphabets Vi. 

Definition 6. Define A* to be the set of [m + L)-vectors 
. . . , Rm, Di, . . . , Dl) satisfying the following conditions: 

1. auxiliary random variables Ui, . . . , Um exist such that 



E 



Ri > I{Yx;Ui\Uic), for alllC {1,...,™}, and 



2. mappings ^pi : Ui x ■ ■ ■ x Um V; , 1 < I < L exist such that 

Edi{V,MUi,...,Um))<Di 

for some joint distribution 

m 

p{v,yi, . . . ,ym)'[[p{uj\yj). 



Lemma 4 (Lemma 2.2 from 17 ). Every extreme point of A* corresponds to 
some choice of auxiliary variables Ui, . . . ,Um with alphabet sizes \lAj\ < \yj\, 
1 < j < m. 

In order to obtain the cardinality bounds for the CEO problem, we simply 
let L = 1,V = X, and Vi = X. Defining 



c?i(a;, x) = log 



1 

x(x) 
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we see that TZV^^q — conv(^*), where conv(^*) denotes the convex hull 
of A*. Therefore, Lemma [i] implies that all extreme points of 'R-'D^^q are 
achieved with a choice of auxiliary random variables Ui, . . . , Um with alphabet 
sizes \Uj\ < |3^j|, 1 < i < rn. By timesharing between extreme points, any point 
in TZV^^Q can be achieved for these alphabet sizes. 

Obtaining the cardinality bounds for the multiterminal source coding prob- 
lem proceeds in a similar fashion. In particular, let L = m = 2, V = {Yi,Y2), 
and Vj = yj, j ~ 1,2. Defining 

dj{{yi,y2),yj) = log (j^^^ ^ = 1,2, 

we see that TZV = conv(^*). In this case. Lemma 4 implies that all extreme 
points of TZT> are achieved with a choice of auxiliary random variables Ui,U2 
with alphabet sizes \Uj\ < |3^j|, 1 < j < 2. By timesharing between extreme 
points, any point in TZD can be achieved for these alphabet sizes. 

In order to obtain cardinality bounds on the timesharing variable Q, we can 
apply Caratheodory's theorem (cf. |24|). In particular, if C C M" is compact, 
then any point in conv(C) is a convex combination of at most n + 1 points of 
C. Taking C to be the closure of the set of extreme points of A* is sufficient 
for our purposes (boundedness of C can be dealt with by a standard truncation 
argument). 

B Extension of CEO Results to m Encoders 

In this appendix, we prove the generalization of Theorem|3]to m encoders, which 
essentially amounts to extending the argument in the proof of Theorem |3] to the 
general case. We begin by stating the m-encoder generalizations of Theorems 
[l] and [2j the proofs of which are trivial extensions of the proofs given for the 
two-encoder case and are therefore omitted. 

Definition 7. Let Ti-cEO m '^f '^^^ {Rii • • ■ , Rrm D) satisfying 

^R^> liYi: Ui\Ui.,Q) for all I C {!,..., m} 

D>H{X\Ui,...,U„^,Q). 
for some joint distribution p{q)p{x) Y[iLiPiyi\^)p{'^i\yi^ l)- 

Theorem 10. All rate distortion vectors {Ri, . . . , Rm, D) e T^cEOm 
achievable. 

Definition 8. Let Ti-cEO m '^f i-R■^^ • ■ ■ i D) satisfying 

Y,R^>Y^ I{U^■, Y,\X, Q) + H{X\Ui.,Q) - D for allIC{l,...,m} (31) 

iei iei 

D>H{X\Ui,...,Ura,Q)- (32) 
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for some joint distribution p{q)p{x) Y[7LiPiyi\^)p{'^i\yii 

Theorem 11. // (Ri, . . . , Rm, D) is strict-sense achievable, then 

(i?l, . . . ,Rm,D) G T^CEO.m- 

Given the definitions oiTZ'^^Q „ and Ti-cEO rm generahzation of Theorem 
[3] to m encoders is an immediate consequence of the foUowing lemma: 

Lemma 5. 7^^£;o,m ^ T^CEO.jn- 

Proof. Suppose (i?i, . . . , i?„i, D) e 'R-ceo mi ^^^"^ by definition there exists p{q) 
and conditional distributions g)}™ so that (31) and (32) are satisfied. 

For the joint distribution corresponding to p{q) and conditional distributions 
p{(wi|yi, q)}™ 1, define Vd C M™ to be the polytope defined by the inequalities 
(31). Now, to show . . . , D) e Ti-cEO m' suffices to show that each 
extreme point of Vd is dominated by a point in 'TVq^q ^ that achieves distortion 
at most D. 

To this end, define the set function / : 2['"1 — > M as follows: 

/(I) :== I{Yi- Ux\Ux.,Q) -{D- H{X\Uu ...,[/„, Q)) 

- I{U.\Y,\X, Q) + H{X\Ux.,Q) - D. 
iex 

It can be verified that the function / and the function f^{I) = max{/(I), 0} 
are supermodular functions (see Appendix [C|) . By construction, Vd is equal to 
the set of . . . , i?,„) which satisfy: 

Y.R^>f+iI). 

It follows by basic results in submodular optimization (see Appendix [C| 
that, for a linear ordering zi ^ 12 ^ ■ ■ • ^ «m of {1, . . . , m}, an extreme point of 
Vd can be greedily computed as follows: 

Rij = f^iin,-- - ■ • -Jj-i}) for j = 1, . . . ,m. 

Furthermore, all extreme points of Vd can be enumerated by looking over all 
linear orderings ii ^ ^2 -<•••-< i,„ of {1, ... , to}. Each ordering of {1, ... , to} is 
analyzed in the same manner, hence we assume (for notational simplicity) that 
the ordering we consider is the natural ordering ij = j. 

Let j be the first index for which Rj > 0. Then, by construction, 

Rk = I{Uk; Yk\Uk+i, ...,{/„, Q) for aU k > j. 

Furthermore, we must have /({I, . . . , j'}) < for all j' < j. Thus, Rj can be 
expressed as 

i 

= Q) + H{x\u,+i,. . . , u,n, Q)-D 

i=l 

- I{Yf, t/j|C/,+i, ...,[/„, Q) + /({I, ... ,j - 1}) 
^{l-e)I{Y,;U,\U,+^,...,Ura,Q), 
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where 9 € [0,1) is defined as: 

/(y,;C/,|c/,+i,...,c/„„g) 

^ D - H(X\U,, ...,[/„, Q) - /([/i, . . . , Fi, . . . , rj-il^j, ■ ■ ■ , U„^,Q) 

IiY,;Uj\U,+i,...,U„,,Q) 



By the results of Theorem 10 the rates . . . ,Rm) permit the following 
coding scheme: For a fraction (1 — 9) of the time, a codebook can be used 
that allows the decoder to recover J7", . . . , with high probability. The other 
fraction 9 of the time, a codebook can be used that allows the decoder to recover 
. . . with high probability. As n — >■ oo, this coding scheme can achieve 
distortion 

^ = (1 - 9)H{X\U,, . . . , [/,„, Q) + 9H{X\U,+i, . . . , C/,„, Q) 
= H(X\U„ ...,[/„, Q) + 9I{X; U,\U,+i, ...,U^,Q) 

^ ' ' /(r,;C/,|t/,+i,...,[/™,Q) 
[i? - i?(X|C/i, ...,Um,Q)- I{Ui, . . . , [/j-i; Fi, . . . , . . . , Um, Q)] 

< H{X\U,, ...,U^,Q) + D- H{X\Ui, ...,Ura,Q) 

- /(C/i, . . . , Yi, . . . , Yj-i\Uj, ...,U^,Q) (33) 

= + /(X; c/i, . . . c/,-i|c/,, . . . , [/™, g) 

- /(C/i, . . . , C/,-i; . . . , . . . , f/™, Q) 

= D- I{Ui, . . . , Fi, . . . , Fj-ilX, (7,, ...,U.m,Q) 

< D. (34) 



In the preceding string of inequalities (33) follows since Uj is conditionally 
independent of everything else given {Yj,Q), and (34) follows from the non- 
negativity of mutual information. 

Therefore, for every extreme point (i?i, . . . , i?„j) of Vd, the point 
(^1, . . . , Rm, D) lies in TZceo m- This proves the lemma. □ 

Finally, we remark that the results of Appendix |X] imply that it suffices to 
consider auxiliary random variables Ui, . . . , Um with alphabet sizes \Uj\ < |3^j|, 
1 < j < m. The timesharing variable Q requires an alphabet size bounded by 
IQI < m + 2. 



C Supermodular Functions 

In this appendix, we review some basic results in submodular optimization that 
were used in Appendix [B| to prove Lemma [Sj We tailor our statements toward 
supermodularity, since this is the property we require in Appendix [B| 
We begin by defining a supermodular function. 
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Definition 9. Let E = {1, ... ,71} he a finite set. A function s : 2^ — > M is 
supcrmodular if for all S,T C_ E 

s{S) + s{T) <s{SnT) + s{SUT). (35) 

One of the fundamental results in submodular optimization is that a greedy 
algorithm minimizes a linear function over a supermodular polyhedron. By 
varying the linear function to be minimized, all extreme points of the super- 
modular polyhedron can be enumerated. In particular, define the supermodular 
polyhedron V{s) C M" to be the set of x G M" satisfying 

> s{T) for all T C E. 

The following theorem provides an algorithm that enumerates the extreme 
points of 7'(s). 



Theorem 12 (See [25p7 



). For a linear ordering ei ^ 62 ^ ■ • • ^ e„ of the 



elements in E, Algorithm C.l returns an extreme point v ofV{s). Moreover, 
all extreme points ofV{s) can he enumerated hy considering all linear orderings 
of the elements of E. 

Algorithm C.l: GKEE.DY(s,E,~i) 

comment: Returns extreme point v ofV{s) corresponding to the ordering ^. 

for i = 1, . . .n 

Set = s({ei,e2, . . . , e J) - s({ei,e2, . . • ,ej_i}) 
return (v) 



Proof See 25 -27 



□ 



Theorem [12] is the key tool we employ to establish Lemma [5) In order to 
apply it, we require the following lemma. 

Lemma 6. For any joint distrihution of the form 
p{q)p{x)YYl^iPiyi\x)p{ui\yi,q) and fixed D € M, define the set function 
f : 2[™1 ^ M as: 

f{I) := I{Yi-Ui\Ui.,Q) - {D - H{X\U^, . . . ,U^,Q)) (36) 

= I{U,-Y,\X, Q) + H{X\Ux.,Q) ~ D, 
iei 

and the corresponding non-negative set function : 2['"1 — )• M as /+ = 
max{/, 0}. The functions f and are supermodular. 
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Proof. In order to verify that / is supermodular, it suffices to cfieck that the 
function /'(I) = I{Yx;Ux\Ux-:,Q) is supermodular since the latter two terms 
in (361 are constant. To this end, consider sets T,S C {!,... ,m} and observe 
that: 

/' (5) + /' {T) ^ I{Ys;Us\ Ug^ ,Q) + I{Yt;Ut\ Ut^ , Q) 

= H{Us\Us^,Q) - HiUs\Ys,Q) + H{Ut\Ut^,Q) - H{Ut\Yt,Q) 
= H{Us\Us^,Q) + H{Ut\Ut^,Q) 

- H{Usut\Ysut, Q) - H{UsnT\YsnT, Q) (37) 
= H{Us\t\Us^,Q) + H(UsnT\Ui^snTr,Q) + H{Ut\Ut^,Q) 

~ H{Usut\Ysut, Q) - H{UsnT\YsnT, Q) (38) 

= H{Us\t\Us^,Q)+H{Ut\Ut^,Q) - i/([/suT|>5UT,Q) 

+ I{UsnT] Ysr\T\U(^snTY,Q) 
< H{Us\T\U^suTr,Q) + H{Ut\Ut^,Q) ~ H{Usut\Ysut, Q) 

+ nUsnT;YsnT\U^snTr,Q) (39) 
= HUsuT] Ysut\U(sut)'',Q) + I{UsnT; YsnT\U(snTY,Q) 
= f{SnT) + f{SUT). 

The labeled steps above can be justified as follows: 



• ( 37 ) follows since Ui is conditionally independent of everything else given 
iY,,Q)- 



• (38) is simply the chain rule. 



• ( 39 ) follows since conditioning reduces entropy. 

Next, we show that /+ = max{/, 0} is supermodular. Observe first that 
/ is monotone increasing, i.e., if 5* C T, then f{S) < f{T). Thus, fixing 
S*, T C {!,..., m}, we can assume without loss of generality that 

f{SnT)<f{S)<f{T)<f{SUT). 



If f{S n r) > 0, then (351 is satisfied for s = /+ by the supermodularity of 
/. On the other hand, if f{S U T) < 0, then ([|5| is a tautology for s = /+. 
Therefore, it suffices to check the following three cases: 

• Case 1: f{S HT) < < f{S) < f{T) < f{S U T). In this case, the 
supermodularity of / and the fact that > / imply: 

f+{s uT) + f+{snT)> f{s u T) + f{s n r) 

>f{S) + f{T)^f+{S) + f+{T). 

• Case 2: f{S n T) < f{S) < < /(T) < f{S U T). Since / is monotone 
increasing, we have: 

f+{SUT) + f+{SnT) = f{SUT) + 0> /(T) + = f+(S) + f+{T). 
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• Case 3: f{S n T) < f{S) < f{T) < < /(S* U T). By definition of /+: 

f+{S UT) + f+{S n T) = /(S* U T) + > + = f+{S) + f+{T). 

Hence, = max{/, 0} is supermodular. 



□ 



D Amplifying a Pointwise Convexity Constraint 

Lemma 7. Let ri , r2 G M 6e given, and suppose fi'.K^R and /2 : /'C — > M 
are continuous functions defined on a compact domain K d M" . // there exists 
a function h : [0,1] K satisfying 

t (/i o h) {t) + {l-t) (/2 o h) (t) < tn + (1 - t)r2 for all t G [0, 1], (40) 

then there exists 2 ^ K and t* G [0, 1] for which 

t*fi{xl) + {l-t*)f,ix*2)<n 

t*f2{xl) + {l-t*)f2ix*)<r2. 

Before we prove the lemma, we make a few remarks. At first glance, this 
lemma appears somewhat bizarre. Indeed, the set K need only be compact 
(e.g., connectedness is not required) and h can be an arbitrarily complicated 



function, as long as it satisfies (40). The strange nature of the lemma is echoed 
by the proof in that we merely prove the existence of the desired xl, X2 and 
t*] no further information is obtained. Stripped to its core, the existence of the 
desired x\, and t* essentially follows from the pigeonhole principle, which 
manifests itself in the sequential compactness of K. 

Despite its strange nature, Lemma [7] is crucial in establishing the converse 
result for the multiterminal source coding problem under logarithmic loss. In 
this application, K is taken to be a closed subset of a finite-dimensional prob- 
ability simplex and /i , /2 are conditional entropies evaluated for probability 
distributions in K. 

Finally, we remark that the Lemma[7]can be generalized to a certain extent. 
For example, the function h need only be defined on a dense subset of [0, 1] and 
the set K can be a more general sequentially compact space. 

Proof of Lemma^ Since /i, /2 are continuou^ and K is compact, there exists 
M < 00 such that /i and /2 are bounded from above and below by M and — M, 
respectively. Fix e > 0, and partition the interval [0, 1] as = < t2 < • • • < 
tm = 1, such that \tj+i — tj\ < jj. For convenience define Xt := h{tj) when tj 
is in the partition. 



■* Although not required for our purposes, we can assume fi and /2 are defined and contin- 
uous over all of R". This is a consequence of the Tietze extension theorem. 
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Now, for i — 1,2 define piecewise-Unear functions giit),g2it) on [0,1] by: 

(f\ ^ j M^tj) if is in the partition 

^'^^ 1 Of,ixt^) + (1 - e)Mxt^^J if t is in the interval (tj,tj+i), ^ > 

where 9 € (0, 1) is chosen so that t — dtj + (1 — d)tj+i when t is in the interval 

With gi{t) and g2{t) defined in this manner, suppose t — 9tj + (1 — 6)tj^i 
for some j and 9. Then some straightforward algebra yields: 

tg,{t) + (1 - t)g2{t) - {9t, + (1 - 9)t,+,) {9Mxt,) + (1 - 

+ (1 - 9t, - (1 - 9)t,+^) {9Mxt^) + (1 - ^^)/2K+J) 
= (?2 [i,/i(a;tJ + (l-i,)/2K)] 

+ (1 - 9f [t,+Ji{xt^^J + (1 - t,+l)/2K+J] 
+ 9{1 - 9) [(1 - t,)/2K^J + (1 - t,+l)/2K) 



< 
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i,/i(x,J + (l-i,)/2(a;tJ] 



+ (1 - + (1 - i,+l)/2K+j] 

+ 0(1 - 9) [(1 - t,+i)/2(:ri^^J + (1 - t,)h{xt^) 
+tjh{xt,) +tj+ifi{xt^^^)\ +e 
< [tjri + (1 - <j)r2] 

+ (l-0)2[t,.+iri + (l-t,+i)^2] 
+ 9{1 - 9) [(1 - <j+i)r2 + (1 - ij)r2 

= (0t, + (1 - 0)i,+i)ri + (1 - % - (1 - 0)tj+i)r2 + e 
= tri + (1 - t)r2 + e, (42) 

where the first inequality follows since — tj\ is small, and the second 

inequality follows from the the fact that ( 40 ) holds for each tj in the partition. 
Notably, this implies that it is impossible to have 

gi{t) > ri+e and g2{t) > r2 + e 

hold simultaneously for any t £ [0, 1], else we would obtain a contradiction to 
(42). Also, since we included the endpoints ti = and = 1 in the partition, 
we have the following two inequalities: 

gi(l) < ri, and ^2(0) < r2. 

Combining these observations with the fact that gi(t) and (72 (i) are contin- 
uous, there must exist some t* £ [0, 1] for which 

gi{t*) <n + e, and g2{t*) < r2 + e 
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(5i(0),ff2(0)) 



ri + e 

Figure 5: A parametric plot of the function (p : t i-^ 52(i))- Since ip{t) is 

continuous, starts with 32(0) < r2 + ends with 51(1) < ri + e, and doesn't 
intersect the shaded area, ip{t) must pass through the lower-left region. 



simultaneously. An illustration of this is given in Figure [5] which is a mere 
variation on the classical intermediate value theorem. 

Applying this result, we can find a sequence {x'C^ , x^"^ , i satisfying 

i(")A(4")) + (l-i("))A(4"))<ri + i 

n 

t("V2(4"^) + (l-t("')/2(4"^)<r2 + - 

n 

for each n > 1. Since K x K x [0, 1] is sequentially compact, there exists a 
convergent subsequence {nj}°°^i such that {x''j^'\ x'"^'\t^"^^) — )■ {xl,X2,t*) G 
K X K X [0,1]. The continuity of /i and /2 then apply to yield the desired 
result. □ 



E Strengthening the Converse of Theorem [6] 

In this appendix, we prove a stronger version of the converse of Theorem [6j 
To be precise, let 3^™ and 3^2" denote the set of probability measures on J^f 
and 3^^, respectively. Let ^1,^2 be the (extended)-log loss distortion measures 
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defined as follows: 



n 



which satisfy 



where y" (y") is the probability assigned to outcome y" e 3^" by the probability 
measure S . Similarly for (2/2)- Note that this extends the standard 
definition of logarithmic loss to sequence reproductions. 

Definition 10. We say that a tuple {Ri, R2, Di, D2) is sequence-achievable if, 
for any e > 0, there exist encoding functions 

A {!,..., 2"^^} 

/2:3^^'^{1,...,2"«^}, 

and decoding functions 

(t>i : {!,..., 2"«i} X {1,...,2"^=}^ J*" 
(/.2:{l,...,2"«nx{l,...,2"«^}^3)r, 

E d^CFi",!^!") <Z?i+e 
¥.dl{Y^,Y^)<D2 + e, 

Y^^Mi{Yn,h{Yi')). 
Theorem 13. // [Ri, R2, Di, D2) is sequence-achievable, then 

{Ri,R2,Di,D2) e nv = nv*. 

Proof. The theorem is an immediate consequence of Theorem [6] and Lemmas |8] 
and[9j which are given below. □ 

Remark 4. We refer to Theorem \l3\ as the "strengthened converse" of Theorem 
Indeed, it states that enlarging the set of possible reproduction sequences to 
include non-product distributions cannot attain better performance than when 
the decoder is restricted to choosing a reproduction sequence from the set of 
product distributions. 

Lemma 8. If (Ri, R2, Di, D2) is sequence-achievable, then there exists a joint 
distribution 

p{yi,y2,Ui,U2,q) = p{q)p{yi,y2)p{ui \yi,q)p{u2\y2,q) 



where 
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and a Di < Di which satisfies 



Di>H{Y,\Ui,U2,Q) 

D2>Di+ H{Y2 \Ui,U2,Q)-H{Y,\Ui,U2,Q), 



and 



Ri>H{Yi\U2,Q)-Di 
R2 > I{Y2;U2\Yi,Q) + H{Yi\Ui,Q) - Di 
Ri + R2 > I{Y2;U2\Yi,Q) + H{Yi) - Di. 

Proof. For convenience, let Fi — fi{Y{^) and F2 — h{X'2)i where /i, /2 are the 
encoding functions corresponding to a scheme which achieves {Ri, R2, Di, D2) 
(in the sequence-reproduction sense). Define Di — ^H{Y{^\Fi,F2), so that: 



nDi^H(Y{'\Fi,F2) 



(43) 



Since nDi > H(Y"\Fi, F2) by the strengthened versiorj^of Lemmajl] we have 
Di < Di as desired. By definition of Di, we immediately obtain the following 
inequality: 



nD,=J2H{YiAFuF2,Y{^^,+,) > ^ i?(yi,.|Fi, F2, +i)- 

i=l 1=1 

Next, recall the Csiszar sum identity: 

n n 

Y,i{Y-,^,-Y2,\Yr\FuF2) = ^/(yr^yM|yi%i,Fi,i^2). 



(44) 



This, together with (431, imphes the following inequality: 

n 

nD2 > nD,+J2H{Y2,^\F,,F2,Yr\Yr^,^,) - H{Y,,\F^, F2,Yr\Y,%,) 



(45) 



^Scc the comment in Seetion 



3.3 
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which we can verifiy as follows: 

n 

nD, > H{Y^\F^,F2) - ^ i?(r2,d^i, F2, ') 

n 
n 

n 

= H{Yr\Fi,F2) + J2 H{Y2.,\Fi,F2, Y^-\Y,"^,^,) - H{Y^,,\F,, F2,Yr\Yl, 

n 

= +^F(y2,.|^i,F2,Fr',n"+i) - HiY,,\F,,F2,Yt\Y,%,). 

Next, observe that we can lower bound R\ as follows: 
nRx>B{F^)>IiJl';F^\F2) 

n 

= ^ H{Y,,,\F2, Yr') - H{Y{^\F,,F2) 



> H{Yi^,\F2, Y^-\Y^-') - nD, 

n 

= Y,H{Yi,,\F2,Yt^)~nDi 

n 

> H{Y,^.,\F2, Yr\Y,%,) - nD,. 



(46) 
(47) 
(48) 



i=l 



In the above string of inequalities, (46) follows from (43 1 and the fact that 
conditioning reduces entropy. Equality (471 follows since Yi ^ -n- F2,Y2^^ o 
Y^~^ form a Markov chain (in that order). 
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Next, we can obtain a lower bound on i?2- 
ni?2 > HiF2) > H{F2\F,) - i/(^^2|Fi, V) + /(F^"; Faji^i) 

= I{Y2^;F2\Yl')+I{Y{^;F2\F^) 

n 
n 

> J2 F2\Y,-, Y^') + H{Y^,,\F,,Yt\yiMi) - "^1 



(49) 
(50) 



= Y,I{Y2^,■F2,Yt\Yr^\Y,^^,Y^\Yl,+{) + H{Y,^^F^,Y^\Y^^,+^ - nD, 

i=l 

(51) 

n 

>Y,I{Y2,f,F2,Yr'\Y,,^,Yr\Y{^,^,) + H{Y,,,\FuYr\Y{^^,^,)-nD,. 

(52) 



In the above string of inequalities, ( 50 1 follows from ( 43 1 and the chain rule. ([51| 
follows from the i.i.d. property of the sources, and (52) follows by monotonicity 
of mutual information. 

A lower bound on the sum-rate Ri + R2 can be obtained as follows: 

n(i?i + i?2) > HiFi) + H{F2) > H{F2) + H{F^\F2) 
>I{F2-,Y^,Yi') + I{F,;Y^\F2) 
= /(F2; Fi") + /(F2; KTI^^i") + I{Fi:Y^\F2) 
= I{F2;Y;'\YI') + I{F,,F2;YI') 

n 

>Y^I{Y2X,F2,Yi-^\Y,,„Yi-\Y^^.^^^)^H{Y,^,) - nD,. (53) 



Where (53) follows in a manner similar to (49)-(52) in the lower bound on i?2- 
Now, define Ui,, = i^i, U2,^ = iF2,Y^-^), and Q, = (i^2"^ ^1" +i)- Then we 



can summarize our results so far as follows. Inequalities (|44| and (45) become 
1 " 

Di > - Vi/(yi,|c/i,„{/2,„Q,) 

1=1 

1 " 

D2 > Di + -Y,H{Y2,^\Ul,^,U2,^,Q^) - H {Yi,,\Ui,„ U2,^, Q^) , 
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and inequalities (48), (52), and (53) can be written as: 



1 " 

n ^ — ' 

1=1 

1 " 

R2 > - y2I{Y2,^■,U2,^\Yl,^,Q^) + H{Yi^,\Ui,,,Q^) - 

1 " 

-Rl + i?2 > - V I{Y2,^■, U2,^\Yl,^, Q^) + H{Y^,,) - D^. 
i=l 

Next, we note that Ui^i O Yi^i -H- 12,1 U2,i form a Markov chain (in that 
order) conditioned on Qi. Moreover, Qi is independent of yi,i,l2,i- Hence, a 
standard timesharing argument proves the lemma. □ 

Lemma 9. Fix R2, Di, 02). If there exists a joint distribution of the form 

p{yi,y2,Ui,U2,q) = p{q)p{yi,y2)p{ui \yi,q)p{u2\y2,q) 

which satisfies 



and 



Di>HiYi\Ui,U2,Q) 

D2 > + H{Y2\Ui,U2,Q) ' H{Yi\Ui,U2,Q), 



Ri>H{Yi\U2,Q)-Di 
R2 > I{Y2; U2\Yi,Q) + H{Yi\Ui,Q) - 
Ri+R2> I{Y2; U2\Yi,Q) + H{Yi) - Di, 



(54) 
(55) 



(56) 
(57) 
(58) 



then {Ri, R2, Di, D2) G nV\ 

Proof. Let V denote the polytope of rate pairs which satisfy the inequal- 
ities (56)- (58). It sufhces to show that if (r 1,7-2) is a vertex of V , then 
{ri,r2, Di, D2) E TZ'D\ For convenience, let [x]^ — max{x, 0}. There are 
only two extreme points of V: 



H{Yi\U2,Q)~Di 



I{Y2; U2\Yi.Q) + H{Yi) -D^- r^'^ 



and 



rf ) = 7(^2; f/2|Fi, Q) + - A - 



„(2) 



I{Y2] U2\Y^.Q) + HiYi \Ui,Q)-Di 



We first analyze the extreme point {r[^\r^''): 
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• Case 1.1: rf^ =0. In this case, we have r^^^ = I {Y2; U2\Yi , Q) + H (Yi) ^ 
Di. This can be expressed as: 

4^' = (1-0)7(^2; [/2IQ), 

where 

- I{Y2;U2\Yi,Q) - H{Yi) + I{Y2;U2\Q) 
HY2;U2\Q) 

Since rl^"^ = 0, we must have Di > H(Yi\U2,Q)- This implies that 

HiY,\U2,Q) - IiY2;U2\Yi,Q) - H{Yi) + I{Y2;U2\Q) ^ 
I{Y2;U2\Q) 

Also, we can assume without loss of generality that Di < H{Yi), hence 
6 e [0, 1]. Applying the Berger-Tung achievability scheme, we can achieve 
the following distortions: 

Dl = eH{Yi) + (1 - e)H{Yi\U2,Q) 
= H{Yi\U2,Q) + 0I{Yi;U2\Q) 

<H{Yi\U2,Q) + Di- I{Y2; U2\Yi,Q) - H{Yi) + I{Y2; C/alQ) (59) 
= A - I{Y2; U2\Y,,Q)- I{Y,; C/2IO) + 7(^2; C/2IQ) 



where (59) follows since I{Yi; C/2IQ) < 7(^2; U2\Q) by the data processing 
inequality. 



D^2 = 0H{Y2) + (1 - 0)77(^2 1 C/2, g) 
= H{Y2\U2.Q) + ei{Y2:U2\Q) 

= H{Y2\U2, Q) + Di- I{Y2; U2\Yi,Q) - H{Yi) + I{Y2; U2\Q) 
= H{Y2) + D,- I{Y2; U2\Y,,Q) - 77(yi) 

= 77(y2|yi, [/2, Q) + 7?i - 77(yi|r2) 

= H{Y2\Yi,Ui,U2,Q) + - H{Yi\Y2) (60) 

< H{Y2\Yi, Ui, U2,Q) +Di- H{Yi\Y2, C/i, U2, Q) 
= H{Y2\Ui,U2,Q) + D, - H{Y,\U,,U2,Q) 

< D2, (61) 



where (60) follows since Ui <-> {Yi,U2,Q) O I2, and (61) follows from 
(1551). 



• Case 1.2: r[^^ > 0. In this case, we have = I{Y2;U2\Yi,Q) 
7(Yi; [/2IQ) = 7(^2; U2\Q). Also, we can write r'^^ as: 

rW = (l-0)7(ri;f/i|[/2,Q), 
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where 

Di - H{Y,\U2,Q) + I{Yi;Ui\U2,Q) 
I{Y,;Ui\U2,Q) 

Since rj^-* > 0, we must have Di < H{Yi\U2,Q)- This imphes that 

H{Y^\U2,Q)-H{Y,\U2,Q)+I{Y^;U^\U2,Q) ^ 
I{Y,;Ui\U2,Q) 



Also, ^ imphes that Di > H{Yi\Ui,U2,Q), hence £ [0,1]. Apply- 
ing the Berger-Tung achievability scheme, we can achieve the following 
distortions: 

Dl = eH{Yi\U2, Q) + (1 - e)H{Yi\Ui,U2,Q) 
= H{Yi\Ui, U2,Q) + ei{Yi;Ui\U2,Q) 
- H{Yi\Ui, U2,Q) + Di- H{Yi\U2,Q) + /(Fi; Ui\U2,Q) 
= A, 



and 



= 9H{Y2\U2,Q) + (1 - 0)H{Y2\Ui,U2,Q) 
= H{Y2\Uu U2,Q) + 01 {Y2; U,\U2, Q) 

< H{Y2\Uu U2,Q) + Di- H{Y,\U2, Q) + /(n; C/i|f/2, Q) (62) 
= H{Y2\Ui,U2,Q) + Di-H{Yi\Ui,U2.Q) 

< D2, (63) 



where (62) follows since /(F2; t/i|C/2, Q) < I{Yi;Ui\U2,Q) by the data 



processing inequality, and ( [63^ follows from ([55j). 

(2) (2) 

In a similar manner, we now analyze the second extreme point {r\ , ): 

• Case 2.1: ^ = 0. In this case, we have rf ^ = /(Fa; J/2|yi, Q) + i?(>l)- 
Z?!. This can be expressed as: 

= {1-0)1 {Y,;Ui\Q), 

where 

Di - /(y2; ?/2 1^1 , Q) - g(yi ) + /(Fi ; ^1 IQ) 
IiYi;Ui\Q) 

Since r^^^ = 0, we must have Di > H(Yi\Ui,Q) + I{Y2;U2\Yi,Q). This 
implies that 

H{Y^\UuQ) + I{Y2;U2\Y,,Q)-I{Y2;U2\Yi,Q)-H{Y,) + I{Yi;U,\Q) ^ 

IiYl■,U^\Q) 
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Also, we can assume without loss of generality that Di < H{Yi), hence 

^ HjYi) - IjYr, U2\Yi,Q) - HjYi) + IjY,; U,\Q) 
I{Yi;U,\Q) 

and therefore 6 € [0,1]. Applying the Berger-Tung achievability scheme, 
we can achieve the following distortions: 

D'^ ^ 9H{Yi) + {1 - e)H{Yi\Ui,Q) 
= H{Y,\UuQ) + OI{Y,:U,\Q) 

= H{Yi\UuQ) + Di- I{Y2; U2\Y,,Q) - H(Yi) + I{Y,; Ui\Q) 
= Di-I{Y2;U2\Yi,Q) 
< Di, 

and 

D'^ = 9H{Y2) + (l - 9)HiY2\Ui,Q) 
= H{Y2\Ui,Q) + 0I{Y2;Ui\Q) 

<H{Y2\Ui,Q) + D,- I{Y2; U2\Y,,Q) - H{Y,) + I{Y,; U,\Q) (64) 
= H(Y2\Yi, U2, Q) + Di~ H{Yi\Y2, Ui,Q) 

= H(Y2\Yi, Ui,U2,Q)+Di- H{Yi\Y2, t/i, f/s, Q) (65) 
= H(Y2\Ui,U2,Q) + Di ^ H{Yi\Ui,U2,Q) 

< D2, (66) 



where (64) follows since I{Y2]Ui\Q) < I{Yi;Ui\Q) by the data process- 
ing inequality, (65) follows since Ui O (Yi,U2,Q) -O- Y2 and U2 -O- 
(Y2, C/i, Q) O Yi, and ([66]) follows from ([55]). 



• Case 2.2: r^'' > 0. In this case, we have r'^^ = I{Yi;Ui\Q). Also, we 
can write r^^"^ as: 

r!^^ = {l-e)I{Y2;U2\Ui,Q), 

where 

Di - H{Yi\Ui,Q) - IiY2;U2\Yi,Q) + I{Y2;U2\Ui,Q) 
I{Y2;U2\Ui,Q) 

Since > 0, we must have Di < H(Yi\Ui,Q) + I(Y2;U2\Yi,Q). This 
implies that 9 < 1. Also, (|54]) implies that Di > H{Yi\Ui,U2,Q), yielding 

^ ^ H{Y,\Ui,U2,Q) ~ H{Y,\Ui,Q) - I(Y2;U2\Yi,Q) + IiY2;U2\Ui,Q) ^ 

IiY2;U2\Ui,Q) 
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Therefore, 9 £ [0,1]. Applying the Berger-Tung achievabihty scheme, we 
can achieve the following distortions: 

Dl - eH{Yi\Ui,Q) + (1 - e)H{Yi\Ui, U2, Q) 
- H{Y^\U^,U2,Q) + eI{Y^■ C/2IC/1, Q) 

< iJ (Fi I C/i , C/2 , g) + 7^1 - (Fi I iJi , Q) - /(Fa ; C/2 1 Fi , g) + 7(^2 ; C/2 1 C/i , Q) 

(67) 



where (jeTf follows since /(Fi; ?72|/7i, Q) < /(F2; /72|t/i, Q) by the data 
processing inequality. 

= 0iJ(F2|i7i, Q) + (1 - e)H{Y2\Ui, (72, Q) 
= H(F2 1 f/i , C/2 , 0) + 91 {Y^ ; t/2 1 i7i , g) 

= i7 (F2 1 C/i , C/2 , g) + - iJ (Fi I iJi , Q) - /(F2 ; C/2 1 Fi , g) + /(F2 ; C/2 1 C/i , Q) 

= i7(F2|c/i, c/2, g) + - i/(Fi|c/i, c/2, g) 

< -D2, (68) 



where ( 68 ) follows from ( 55 ) . 

Thus, this proves that the Berger-Tung compression scheme can achieve any 
rate distortion tuple (ri, r2, -Di, I?2) for (?"!, ■''2) € V. Since TtV' is, by definition, 
the set of rate distortion tuples attainable by the Berger-Tung achievabihty 
scheme, we must have that i?2, ^ii -D2) G TZT)^ . This proves the lemma. □ 



F A Lemma for the Daily Double 

For a given joint distribution p(yi,j/2) on the finite alphabet x 3^2, let 
V{Ri, R2) denote the set of joint pmf's of the form 

piq, yi, 2/2, ui, W2) = p{q)p{yi , y2)p{ui \yi,q)p{ui\yi , q) 

which satisfy 

i?i >/(Fi;C/i|C/2,g) 
i?2 >/(F2;[/2|i7i,g) 
i?i+i?2 >/(Fi,F2;C/i,C/2|g) 

for given finite alphabets Ui,U2, Q- 

Lemma 10. For i?i,i?2 satisfying Ri < H(Yi), R2 < H{Y2), and Ri + R2 < 
H(Yi,Y2), the infimum 

inf {iJ (Fi I iji , t/2 , g) + i/(F2 1 c/i , f/2 , g) } 

pG-P(-Ri,-R.2) 

is attained by some p* G 7'(i?i,i?2) which satisfies Ri + R2 = 
/(Fi, F2; Ui , [/||g*), where C/j*, U2, Q* correspond to the auxiliary random vari- 
ables defined by p* . 
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Proof. First, note that the infimum is always attained since V{Ri, R2) is com- 
pact and the objective function is continuous on V{R\,R2)- Therefore, let 
{/*, U2,Q* correspond to the auxiliary random variables which attain the infi- 
mum. 

If H{Yi\U^,U;,Q*) + H{Y2\U^,Ui,Q*) = 0, then we must have 
7(^1,^2; Ut, C/2IQ*) = HiYi,Y2). Thus, J?i + i?2 = /(n, ^2; Ut,U^\Q*). 

Next, consider the case where H(Yi\Ul , Q*) + H(Y2\U^,U^,Q*) > 0. 
Assume for sake of contradiction that Ri + R2 > /(Fi, I2; C^i , C^2 IQ*)- Foi' 
per{Ri,R2): 

I{Yi;Ui\U2,Q) + I{Y2;U2\Ui,Q) < I{YuY2:UiM2\Q). 

Hence, at most one of the remaining rate constraints can be satisfied with equal- 
ity. If none of the rate constraints are satisfied with equality, then define 

, ~ {Ui, U2) with probability 1 — e 

[Ui , tV2 j - I ^ with probability e. 

For e > sufficiently small, the distribution p corresponding to the auxiliary 
random variables f/i, C/2, Q* is still in V{Ri,R2)- However, p satisfies 

H{Yi\Ui,U2,Q*) + HiY2\Ui,U2,Q*) < H{Y, \U^,U^,Q*) + H{Y2\U^,U^,Q*), 

which contradicts the optimality of p* . 

Therefore, assume without loss of generality that 

R,=I{Y,;U*\U;,Q*) 
Ri + R2 > IiYi,Y2;Ut,U^\Q*)- 

This implies that R2 > /(I2; UHQ*). Now, define 

with probability 1 — e 



r wit 
1^ Y2 wit 



U2 — . 

' "'^ with probability e. 

Note that for e > sufficiently small: 

I{Y2;U;\Q*) < IiY2:U2\Q*) < R2 
IiY„Y2;U*,U;\Q*) < I{Y„Y2;U*,U2\Q*) < Ri + R2, 

and for any e e [0, 1]: 

Ri=I{Y,;U^\U^,Q*) > I{Yv,Ut\U2,Q*) 

H{Y^\Ul, Ui,Q*) + H{Y2\Ut, C/2*, Q*) > H{Y,\U^,U2, Q*) + H{Y2\U^, U2, Q*). 

(69) 

Since R2 < H{Y2), as e is increased from to 1, at least one of the following 
must occur: 
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1. I{Y2;U2\Q*) = R2- 

2. I{Y^,Y2;Ut,U2\Q*) ^ Ri+ R2- 

3. I{Y^-,Ui\U2,Q*) < Ri- 

If either of events 1 or 2 occur first then the sum-rate constraint is met with 
equaHty (since they are equivalent in this case). If event 3 occurs first, then all 
rate constraints are satisfied with strict inequality and we can apply the above 



argument to contradict optimality oi p* . Since (69) shows that the objective is 
nonincreasing in e, there must exist ap £ V{Ri, R2) which attains the infimum 
and satisfies the sum-rate constraint with equality. □ 
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